We have created some video content here using Python with Excel that illustrates the different ways you can leverage Python to
Data cleanse
Find unwanted characters
And see if the file is empty before import!
What this video is giving as output
Here we are looking to introduce charts in Excel, and how how to use Python to easily work with your data and export to an excel sheet.
Below is the final output of our two charts, this is for illustrative purposes, and is taken from the Irish Governments website as at 1st May 2020, importing the cell ranges associated with them.
How we went about it
In the below video on creating a python excel chart, we have approached this as follows:
Created four separate data frames, they are the four regions that will feed into the creation of the graphs.
Separate to this, we merged the four data frames into one, to use with the bar chart.
Validating cells in Excel quickly – how to do it easily!
Are you working with large spreadsheets and looking to quickly at data validation exercise to save you time?
The aim would be to run your code and test it against some predefined rules you or your data analyst would have written to make sure it brings back the expected checks.
If you look at the below, this is the final output of this video, highlighting two cells that are over budget based on the companies predefined budget.
The structure of this code can be broken down into the following steps:
This import will not cost you anything except running some code! The need to productively have an all in one solution to manage your data as your code has become more critical as volumes of data become larger. Do you, as a data analyst, therefore, need to send your data into an excel file? Previously we would have posted a video How do I remove unwanted characters, and here we build on that theme, linking in with Excel.
Load data from a data frame and populate it into an excel file.
Renaming of a sheet.
Creating a new sheet and giving it a name.
We look at properties, namely changing the colour of a tab to yellow.
You may need to put some text in a sheet cell, to act as a header or to show you have a total figure there.
And the final piece of functionality covered in this video is how to copy data from one sheet into another one.
There are several benefits to putting all the upfront work in Python:
The benefits of cleansing the data or format will save time further down the road.
After you receive the document, you can quickly review it without fixing errors in the data.
If you are distributing the output to several people, it quickly gets them what they want, without manual intervention after the logic has completed.
I have certainly benefited from this data cleansing and importing it into excel exercise, as the two are combined now, makes it a more efficient process.
Please remember to subscribe to our channel if you like the work we are doing, thanks!
Data Analytics Ireland
The code used in this video is here:
########## - Previous video - How to check for invlaid characters in your dataset ###########
###########################################################################################################
#######################This video - How to import data into Excel #########################################
#In Python 3, leading zeros(at the start of a sequence of numbers) are not allowed on numbers
#Numbers need to be put between "" to be looped through
import pandas as pd
#Create a dataset locally
data = {'Number': ["00&00$000", '111$11111','2222€2222','3333333*33','444444444£','555/55555','666666@666666'],
'Error' : ['0','0','0','0','0','0','0']}
#Create a dataframe and tell it where to get its values from
df = pd.DataFrame (data, columns = ['Number','Error'])
#Function to loop through the dataset and see if any of the list values below are found and replace them
def run(*l):
#This line extracts all the special characters into a new column
#Using regular expressions it finds values that should not appear
df['Error'] = df['Number'].str.extract('(\D+|^(0)$)')
#This line removes anything with a null value
df.dropna(subset=['Error'], inplace=True)
#This line reads in the list and assigns it a value i, to each element of the list.
#Each i value assigned also assigns an index number to the list value.
#The index value is then used to check whether the value associated with it is in the df['Number'] column
#and then replaces if found
for i in l:
df['Fix']= df['Number'].str.replace(i[0],"").str.replace(i[1],"").str.replace(i[2],"").str.replace(i[3],"") \
.str.replace(i[4],"").str.replace(i[5],"").str.replace(i[6],"")
print("Error list to check against")
print(i[0])
print(i[1])
print(i[2])
print(i[3])
print(i[4])
print(i[5])
print(i[6])
display(df)
#This is the list of errors you want to check for
l = ['$','*','€','&','£','/','@']
run(l)
import xlsxwriter
#### This is the start of the code to import the data to excel
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(r"C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")
################ XlsxWriter can only create new files. It cannot read or modify existing files.#################
# Position the dataframes in the worksheet, and names the sheet Analysis
df.to_excel(writer, sheet_name='Analysis') # Default position, cell A1.
# Close the Pandas Excel writer and output the Excel file.
writer.save()
writer.close()
### Modify the exportxlsx.xlsx file created above through openpyxl ###
#Load xlsx file, and rename tab "Analysis" to "NewAnalysis"
from openpyxl import load_workbook
importwb = load_workbook("C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")
existing_sheet = importwb['Analysis']
#existing_sheet.title = 'NewAnalysis'
#Create new sheet on existing excel sheet, 1 signifies that it is to be created after the first sheet
#The first sheet been value 0
newsheet = importwb.create_sheet("Newsheet", 1) # insert after the first sheet as we know it already exists.
#Setting the tab colour of the newsheet to yellow
newsheet.sheet_properties.tabColor ='FFFF00'
#Putting text into a particular cell
newsheet['B3'] = "This is some text"
#Tell the code to go to the sheet we want to copy
#sourcesheet=importwb['NewAnalysis']
#Tell the code to copy the data to the newsheetcopy. It creates a new sheet and copies the data in.
#newsheetcopy=importwb.copy_worksheet(sourcesheet)
importwb.save("C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")
importwb.close()
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.