How to change the headers on a CSV file

Problem statement

You are working away on some data analytics projects and you receive files that have incorrect headings on them. The problem is without opening the file, how can you change the headers on the columns within it?

To start off, lets look at file we want to change , below is a screen shot of the data with its headers contained inside:

So as you can see we have names and addresses. But what it if we want to change the address1 ,address2,address3,address4 to something different?

This could be for a number of reasons:

(A) You are going to use those columns as part of an SQL statement to insert into a database table, so you need to change the headers so that SQL statement won’t fail.

(B) Some other part of your code is using that data, but requires the names to be corrected so that does not fail.

(C) Your organisation has a naming convention that requires all column names to be a particular structure.

(D) All data of a similar type has to be of the same format, so it can be easily identified.

What would be the way to implement this in Python, then?

Below you will see the code I have used for this, looking to keep it simple:

import pandas as pd
#df = pd.read_csv("csv_import.csv",skiprows=1) #==> use to skip first row (header if required)
df = pd.read_csv("csv_import.csv") #===> Include the headers
correct_df = df.copy()
correct_df.rename(columns={'Name': 'Name', 'Address1': 'Address_1','Address2': 'Address_2','Address3': 'Address_3','Address4': 'Address_4'}, inplace=True)
print(correct_df)
#Exporting to CSV file
correct_df.to_csv(r'csv_export', index=False,header=True)

As can be seen there are eight rows in total. The steps are as follows:

  1. Import the CSV file .

2. Make a copy of the dataframe.

3. In the new dataframe, use the rename function to change any of the column headers you require, Address1, Address2, Address3, Address4.

4. Once the updates are completed then re-export the file with the corrected headers to a folder you wish.

As a result of the above steps, the output will appear like this:

And there you go. If you had an automated process, you could incorporate this in to ensure there was no failures on the loading of any data.

Another article that may interest you? How to count the no of rows and columns in a CSV file

How to import data into excel

Estimated reading time: 2 minutes

This import will not cost you anything except running some code!
The need to productively have an all in one solution to manage your data as your code has become more critical as volumes of data become larger. Do you, as a data analyst, therefore, need to send your data into an excel file? Previously we would have posted a video How do I remove unwanted characters, and here we build on that theme, linking in with Excel.

Two techniques used here to achieve this are  XLSX writer explained, and Openpyxl explained.

The elements we cover off are:

  • Load data from a data frame and populate it into an excel file.
  •  Renaming of a sheet.
  • Creating a new sheet and giving it a name.
  • We look at properties, namely changing the colour of a tab to yellow.
  • You may need to put some text in a sheet cell, to act as a header or to show you have a total figure there.
  • And the final piece of functionality covered in this video is how to copy data from one sheet into another one.

There are several benefits to putting all the upfront work in Python:

  1. The benefits of cleansing the data or format will save time further down the road.
  2. After you receive the document, you can quickly review it without fixing errors in the data.
  3. If you are distributing the output to several people, it quickly gets them what they want, without manual intervention after the logic has completed.

I have certainly benefited from this data cleansing and importing it into excel exercise, as the two are combined now, makes it a more efficient process.

Please remember to subscribe to our channel if you like the work we are doing, thanks!

Data Analytics Ireland


Notice: ob_end_flush(): failed to send buffer of zlib output compression (0) in /home/dataana1/public_html/wp-includes/functions.php on line 4979