Python Tutorial: How to create charts in Excel

Estimated reading time: 4 minutes

Taking it further with a python excel chart.

We have created some video content here using Python with Excel that illustrates the different ways you can leverage Python to

  • Data cleanse
  • Find unwanted characters
  • And see if the file is empty before import!

What this video is giving as output

Here we are looking to introduce charts in Excel, and how how to use Python to easily work with your data and export to an excel sheet.

Below is the final output of our two charts, this is for illustrative purposes, and is taken from the Irish Government’s website as at 1st May 2020, importing the cell ranges associated with them.

A barchart of covid cases bullt in Python
a line chart of covid cases built in Python

How we went about it

Step 1 – Importing the data

import pandas as pd
import openpyxl


importfile = openpyxl.load_workbook("C://dataanalyticsireland/Python Code/youtube/codeused/files/Covid19CountyStatisticsHPSCIreland01052020.xlsx")

#Telling the program to go to each individual sheet
Dublin = importfile["Dublin"]
Cork = importfile["Cork"]
Limerick = importfile["Limerick"]
Galway = importfile["Galway"]

#Telling the program for each sheet , create a dataframe.
dfDublin = pd.DataFrame(Dublin.values)
dfCork = pd.DataFrame(Cork.values)
dfLimerick = pd.DataFrame(Limerick.values)
dfGalway = pd.DataFrame(Galway.values)

Step 2 – Splitting the different sheets out into different data frames

### Showing to break down here, further on I cocatenate the dataframes  ###

#This brings in all the values from the sheet into a variable so they can be read.
datadublin = Dublin.values
datacork = Cork.values
datalimerick = Limerick.values
datagalway = Galway.values
#This tells the program that the headers are on row 0.
columnsdublin = next(datadublin)[0:]
columnscork = next(datacork)[0:]
columnslimerick = next(datalimerick)[0:]
columnsgalway = next(datagalway)[0:]
# The imported file has the column headings on row 0, moving them to be the actual column headings instead of numerical values.
dfDublin = pd.DataFrame(datadublin, columns=columnsdublin)
dfCork = pd.DataFrame(datacork, columns=columnscork)
dfLimerick = pd.DataFrame(datalimerick, columns=columnslimerick)
dfGalway = pd.DataFrame(datagalway, columns=columnsgalway)

Step 3 – Exports all three data frames to a separate tab each to the below file

# This starts the process of where to save the output by creating a writer variable
writer = pd.ExcelWriter('C://dataanalyticsireland/Python Code/youtube/codeused/files/Covid19output01052020.xlsx')
# Save the dataframe "Dublin" to Dublin sheet etc...
dfDublin.to_excel(writer, 'Dublin')
dfCork.to_excel(writer, 'Cork')
dfLimerick.to_excel(writer, 'Limerick')
dfGalway.to_excel(writer, 'Galway')
# save the excel file
writer.save()

Step 4 – Merging all the data to allow the charts to be created easily

mergedPDList = [dfDublin,dfCork,dfLimerick,dfGalway]  # Creating a new merged list of the dataframes, easier when doing charts below
mergeddf = pd.concat(mergedPDList) # physical merging of the dataframes

Step 5 – Creating a bar chart

import matplotlib.pyplot as plt
import matplotlib.pyplot as pltline
import seaborn as sns
#from openpyxl.drawing.image import Image
#This makes the seaborn library over ride the matplotlib properites for styling
sns.set()

plt.figure(figsize =(9,6))
plt.bar(x = mergeddf['CountyName'],
       height = mergeddf['ConfirmedCovidCases']
       )
plt.xticks(rotation = 45)
plt.title("No of Confirmed Covid Cases in the Republic of Ireland", fontsize=25, fontweight='bold')
plt.xlabel('Area')
plt.ylabel('No of cases')
plt.width = 35
img = plt.savefig('C://dataanalyticsireland/Python Code/youtube/codeused/files/barchart.png',bbox_inches='tight')
plt.show()

Step 6 – Creating a line chart

x = dfDublin['ConfirmedCovidCases']
x1 = dfCork['ConfirmedCovidCases']
x2 = dfLimerick['ConfirmedCovidCases']
x3 = dfGalway['ConfirmedCovidCases']
#Sets the size of the output before it is created
pltline.figure(figsize=(15,15))
pltline.plot(x)
pltline.plot(x1)
pltline.plot(x2)
pltline.plot(x3)
pltline.legend(['Dublin', 'Cork','Limerick','Galway'], fontsize="x-large")
pltline.ylabel('No of cases')
pltline.xlabel('No of days elapsed')
img1 = plt.savefig('C://dataanalyticsireland/Python Code/youtube/codeused/files/linegraph.png')
pltline.show()

Step 7 – Exporting charts to the XLSX file

from openpyxl import load_workbook
#from openpyxl.drawing.image import Image
import openpyxl
#This step opens the existing file.
wsnew = load_workbook('C://dataanalyticsireland/Python Code/youtube/codeused/files/Covid19output01052020.xlsx')
#This step creates a new worksheet called Barchart
newsheet = wsnew.create_sheet("Bar Chart",4) #Creates a new sheet in fourth position
newsheet1 = wsnew.create_sheet("Line Graph",5) #Creates a new sheet in fifth position
#Adding the image to Newsheet which is called "Barchart" in the excel file
img = openpyxl.drawing.image.Image('C://dataanalyticsireland/Python Code/youtube/codeused/files/barchart.png')
img1 = openpyxl.drawing.image.Image('C://dataanalyticsireland/Python Code/youtube/codeused/files/linegraph.png')
img.anchor = 'B2' # Tells what cell to put the imahe in
img1.anchor = 'B2' # Tells what cell to put the imahe in
newsheet.add_image(img) # Adds the image
newsheet1.add_image(img1) #Adds the image
wsnew.save('C://dataanalyticsireland/Python Code/youtube/codeused/files/Covid19output01052020.xlsx')
wsnew.close()

In the below video on creating a python excel chart, we have approached this as follows:

  • Created four separate data frames, they are the four regions that will feed into the creation of the graphs.
  • Separate to this, we merged the four data frames into one, to use with the bar chart.

And to finish off

If you like what this video has explained, please click to see our YouTube channel for more informative videos.

Data Analytics Ireland

how to validate cell values in excel

Estimated reading time: 2 minutes

Validating cells in Excel quickly – how to do it easily!
Are you working with large spreadsheets and looking to quickly at data validation exercise to save you time?

The aim would be to run your code and test it against some predefined rules you or your data analyst would have written to make sure it brings back the expected checks.

If you look at the below, this is the final output of this video, highlighting two cells that are over budget based on the companies predefined budget.

data validation example

The structure of this code can be broken down into the following steps:

  • Read in the excel file, see a previous example here How to import data into excel
  •  Run the first function,  checks if the spreadsheet cell value is over or under budget.
  •  Run the second function that takes the value from the first function and applies the colour red to the cell if it is over budget.

 

Finally

You can expand this code to incorporate more functionality, such as:

  • Change the colour of the cells, to have multiple colours returned.
  •  Update the two functions to include more business rules.
  •  You could check if the file is empty before processing as shown here How to check if a file is empty

Please subscribe to our YouTube channel, the button is the right-hand side of the page if you would like to see more like these.

Data Analytics Ireland

How to import data into excel

Estimated reading time: 4 minutes

This import will not cost you anything except running some code!
The need to productively have an all in one solution to manage your data as your code has become more critical as volumes of data become larger. Do you, as a data analyst, therefore, need to send your data into an excel file? Previously we would have posted a video How do I remove unwanted characters, and here we build on that theme, linking in with Excel.

Two techniques used here to achieve this are  XLSX writer explained, and Openpyxl explained.

The elements we cover off are:

  • Load data from a data frame and populate it into an excel file.
  •  Renaming of a sheet.
  • Creating a new sheet and giving it a name.
  • We look at properties, namely changing the colour of a tab to yellow.
  • You may need to put some text in a sheet cell, to act as a header or to show you have a total figure there.
  • And the final piece of functionality covered in this video is how to copy data from one sheet into another one.

 

There are several benefits to putting all the upfront work in Python:

  1. The benefits of cleansing the data or format will save time further down the road.
  2. After you receive the document, you can quickly review it without fixing errors in the data.
  3. If you are distributing the output to several people, it quickly gets them what they want, without manual intervention after the logic has completed.

I have certainly benefited from this data cleansing and importing it into excel exercise, as the two are combined now, makes it a more efficient process.

Please remember to subscribe to our channel if you like the work we are doing, thanks!

Data Analytics Ireland

The code used in this video is here:

########## - Previous video - How to check for invlaid characters in your dataset ###########



###########################################################################################################

#######################This video - How to import data into Excel #########################################

#In Python 3, leading zeros(at the start of a sequence of numbers) are not allowed on numbers
#Numbers need to be put between "" to be looped through

import pandas as pd
#Create a dataset locally
data = {'Number':  ["00&00$000", '111$11111','2222€2222','3333333*33','444444444£','555/55555','666666@666666'],
       'Error' : ['0','0','0','0','0','0','0']}

#Create a dataframe and tell it where to get its values from
df = pd.DataFrame (data, columns = ['Number','Error'])

#Function to loop through the dataset and see if any of the list values below are found and replace them
def run(*l):
    #This line extracts all the special characters into a new column
    #Using regular expressions it finds values that should not appear
    df['Error'] = df['Number'].str.extract('(\D+|^(0)$)')   
    #This line removes anything with a null value
    df.dropna(subset=['Error'], inplace=True)
    #This line reads in the list and assigns it a value i, to each element of the list.
    #Each i value assigned also assigns an index number to the list value.
    #The index value is then used to check whether the value associated with it is in the df['Number'] column 
    #and then replaces if found
    for i in l:
        df['Fix']= df['Number'].str.replace(i[0],"").str.replace(i[1],"").str.replace(i[2],"").str.replace(i[3],"") \
        .str.replace(i[4],"").str.replace(i[5],"").str.replace(i[6],"")
        print("Error list to check against")
        print(i[0])
        print(i[1])
        print(i[2])
        print(i[3])
        print(i[4])
        print(i[5])
        print(i[6])
    display(df)

#This is the list of errors you want to check for
l = ['$','*','€','&','£','/','@']     
        
run(l)

import xlsxwriter
#### This is the start of the code to import the data to excel

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(r"C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")


################ XlsxWriter can only create new files. It cannot read or modify existing files.#################

# Position the dataframes in the worksheet, and names the sheet Analysis
df.to_excel(writer, sheet_name='Analysis')  # Default position, cell A1.

# Close the Pandas Excel writer and output the Excel file.
writer.save()
writer.close()

### Modify the exportxlsx.xlsx file created above through openpyxl ###


#Load xlsx file, and rename tab "Analysis" to "NewAnalysis"
from openpyxl import load_workbook
importwb = load_workbook("C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")
existing_sheet = importwb['Analysis']
#existing_sheet.title = 'NewAnalysis'

#Create new sheet on existing excel sheet, 1 signifies that it is to be created after the first sheet
#The first sheet been value 0
newsheet = importwb.create_sheet("Newsheet", 1) # insert after the first sheet as we know it already exists.



#Setting the tab colour of the newsheet to yellow
newsheet.sheet_properties.tabColor ='FFFF00'

#Putting text into a particular cell
newsheet['B3'] = "This is some text"

#Tell the code to go to the sheet we want to copy
#sourcesheet=importwb['NewAnalysis']
#Tell the code to copy the data to the newsheetcopy. It creates a new sheet and copies the data in.
#newsheetcopy=importwb.copy_worksheet(sourcesheet)



importwb.save("C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")
importwb.close()