Estimated reading time: 4 minutes
This import will not cost you anything except running some code!
The need to productively have an all in one solution to manage your data as your code has become more critical as volumes of data become larger. Do you, as a data analyst, therefore, need to send your data into an excel file? Previously we would have posted a video How do I remove unwanted characters, and here we build on that theme, linking in with Excel.
Two techniques used here to achieve this are XLSX writer explained, and Openpyxl explained.
The elements we cover off are:
- Load data from a data frame and populate it into an excel file.
- Renaming of a sheet.
- Creating a new sheet and giving it a name.
- We look at properties, namely changing the colour of a tab to yellow.
- You may need to put some text in a sheet cell, to act as a header or to show you have a total figure there.
- And the final piece of functionality covered in this video is how to copy data from one sheet into another one.
There are several benefits to putting all the upfront work in Python:
- The benefits of cleansing the data or format will save time further down the road.
- After you receive the document, you can quickly review it without fixing errors in the data.
- If you are distributing the output to several people, it quickly gets them what they want, without manual intervention after the logic has completed.
I have certainly benefited from this data cleansing and importing it into excel exercise, as the two are combined now, makes it a more efficient process.
Please remember to subscribe to our channel if you like the work we are doing, thanks!
Data Analytics Ireland
The code used in this video is here:
########## - Previous video - How to check for invlaid characters in your dataset ###########
###########################################################################################################
#######################This video - How to import data into Excel #########################################
#In Python 3, leading zeros(at the start of a sequence of numbers) are not allowed on numbers
#Numbers need to be put between "" to be looped through
import pandas as pd
#Create a dataset locally
data = {'Number': ["00&00$000", '111$11111','2222€2222','3333333*33','444444444£','555/55555','666666@666666'],
'Error' : ['0','0','0','0','0','0','0']}
#Create a dataframe and tell it where to get its values from
df = pd.DataFrame (data, columns = ['Number','Error'])
#Function to loop through the dataset and see if any of the list values below are found and replace them
def run(*l):
#This line extracts all the special characters into a new column
#Using regular expressions it finds values that should not appear
df['Error'] = df['Number'].str.extract('(\D+|^(0)$)')
#This line removes anything with a null value
df.dropna(subset=['Error'], inplace=True)
#This line reads in the list and assigns it a value i, to each element of the list.
#Each i value assigned also assigns an index number to the list value.
#The index value is then used to check whether the value associated with it is in the df['Number'] column
#and then replaces if found
for i in l:
df['Fix']= df['Number'].str.replace(i[0],"").str.replace(i[1],"").str.replace(i[2],"").str.replace(i[3],"") \
.str.replace(i[4],"").str.replace(i[5],"").str.replace(i[6],"")
print("Error list to check against")
print(i[0])
print(i[1])
print(i[2])
print(i[3])
print(i[4])
print(i[5])
print(i[6])
display(df)
#This is the list of errors you want to check for
l = ['$','*','€','&','£','/','@']
run(l)
import xlsxwriter
#### This is the start of the code to import the data to excel
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(r"C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")
################ XlsxWriter can only create new files. It cannot read or modify existing files.#################
# Position the dataframes in the worksheet, and names the sheet Analysis
df.to_excel(writer, sheet_name='Analysis') # Default position, cell A1.
# Close the Pandas Excel writer and output the Excel file.
writer.save()
writer.close()
### Modify the exportxlsx.xlsx file created above through openpyxl ###
#Load xlsx file, and rename tab "Analysis" to "NewAnalysis"
from openpyxl import load_workbook
importwb = load_workbook("C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")
existing_sheet = importwb['Analysis']
#existing_sheet.title = 'NewAnalysis'
#Create new sheet on existing excel sheet, 1 signifies that it is to be created after the first sheet
#The first sheet been value 0
newsheet = importwb.create_sheet("Newsheet", 1) # insert after the first sheet as we know it already exists.
#Setting the tab colour of the newsheet to yellow
newsheet.sheet_properties.tabColor ='FFFF00'
#Putting text into a particular cell
newsheet['B3'] = "This is some text"
#Tell the code to go to the sheet we want to copy
#sourcesheet=importwb['NewAnalysis']
#Tell the code to copy the data to the newsheetcopy. It creates a new sheet and copies the data in.
#newsheetcopy=importwb.copy_worksheet(sourcesheet)
importwb.save("C://dataanalyticsireland/Python Code/youtube/codeused/files/exportxlsx.xlsx")
importwb.close()