Import a CSV file with an SQL query

Estimated reading time: 3 minutes

In many of the SQL programming softwatre tools, there will most likely be an option for you to import your data torugh a screen they have provided.

In SQL Server Management Studio the below screen is available to pick your file to import, and it quickly guides you through the steps.

But what if you wanted to do this through code, as you may have lots of different files to be loaded at different times?

Below we will take you through the steps.

Check the file you want to import

Below we have created a CSV file, with the relevant headers and data. It is important to have the headers correct as this is what we will use in a step further down.

Create the code that will import the data

drop table sales.dbo.test_import_csv; ===> drops the existing table, to allow a refreshed copy be made.

create table sales.dbo.test_import_csv(
customer_name varchar(10),
age int,
place_of_birth varchar(10)); ===> These steps allow you to recreate the table with the column names and data type.

insert into sales.dbo.test_import_csv
select * FROM
OPENROWSET(BULK 'C:\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\SQL\how_to_import_a_csv _file_in_sql\csv_file_import.csv',
formatfile = 'C:\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\SQL\how_to_import_a_csv _file_in_sql\csv_file_import_set.txt',
) as tmp ===> These lines open the file, read in the data to the table created.

There are two important parts to the above code:

(A) OPENROWSET – This enables the connection to the CSV, read it and then insert the information into the database table.

(B) Formatfile – This is an important part of the code. Its purpose is to tell openrowset how many columns are in the CSV file. Its contents are below:

As can bee seen it outlines each column within the file and its name, and the exact column name in the header.

Also the last line indicates that this is the last column and the program should only read as far as this.

Running the code and the output

When the code is run, the following is the output:

Eleven rows where inserted, these are the now of rows in the CSV, excluding the headers.

The database table was created:

And finally, the table was populated with the data:

So in summary we have demonstrated that the ability to easily import a CSV file, with the above steps.

In essence you could incorporate this into an automated process.

How to change the headers on a CSV file

Problem statement

You are working away on some data analytics projects and you receive files that have incorrect headings on them. The problem is without opening the file, how can you change the headers on the columns within it?

To start off, lets look at file we want to change , below is a screen shot of the data with its headers contained inside:

So as you can see we have names and addresses. But what it if we want to change the address1 ,address2,address3,address4 to something different?

This could be for a number of reasons:

(A) You are going to use those columns as part of an SQL statement to insert into a database table, so you need to change the headers so that SQL statement won’t fail.

(B) Some other part of your code is using that data, but requires the names to be corrected so that does not fail.

(C) Your organisation has a naming convention that requires all column names to be a particular structure.

(D) All data of a similar type has to be of the same format, so it can be easily identified.

What would be the way to implement this in Python, then?

Below you will see the code I have used for this, looking to keep it simple:

import pandas as pd
#df = pd.read_csv("csv_import.csv",skiprows=1) #==> use to skip first row (header if required)
df = pd.read_csv("csv_import.csv") #===> Include the headers
correct_df = df.copy()
correct_df.rename(columns={'Name': 'Name', 'Address1': 'Address_1','Address2': 'Address_2','Address3': 'Address_3','Address4': 'Address_4'}, inplace=True)
#Exporting to CSV file
correct_df.to_csv(r'csv_export', index=False,header=True)

As can be seen there are eight rows in total. The steps are as follows:

  1. Import the CSV file .

2. Make a copy of the dataframe.

3. In the new dataframe, use the rename function to change any of the column headers you require, Address1, Address2, Address3, Address4.

4. Once the updates are completed then re-export the file with the corrected headers to a folder you wish.

As a result of the above steps, the output will appear like this:

And there you go. If you had an automated process, you could incorporate this in to ensure there was no failures on the loading of any data.

Another article that may interest you? How to count the no of rows and columns in a CSV file

How to remove characters from an imported CSV file

Estimated reading time: 2 minutes

Removal of unwanted errors in your data, the easy way.
The process of importing data can take many formats, but have you been looking for a video on how you do this? Even better are you looking for a video that shows you how to import a CSV file and then data cleanse it, effectively removing any unwanted characters?

As a follow up to Python – how do I remove unwanted characters, that video focused on data cleansing the data created within the code, this video runs through several options to open a CSV file, find the unwanted characters, remove the unwanted characters from the dataset and then return the cleansed data.

How to get in amongst the problem data:

The approach here looked at three different scenarios:

(A) Using a variable that is equal to an open function, and then reading the variable to a data frame.

(B)Using a “with statement” and an open function together, and returning the output to a data frame.

(C) Using read_csv to quickly and efficiently read in the CSV file and then cleanse the dataset.

Some minor bumps in the road that need some thought

There where some challenges with this that you will see in the video:

  • Options A & B had to deploy additional code just to get the data frame the way we wanted.
  •  The additional lines of code made for more complexity if you were trying to manage it.

In the end, read_csv was probably the best way to approach this; it required less code and would have been easier to manage in the longer run.


As always thanks for watching, please subscribe to our YouTube channel, and follow us on social media!

Data Analytics Ireland