How to Compare Column Headers in CSV to a List in Python

Estimated reading time: 3 minutes

So you have numerous different automation projects in Python. In order to ensure a clean and smooth straight-through processing, checks need to be made to ensure what was received is in the right format.

Most but not all files used in an automated process will be in the CSV format. It is important there that the column headers in these files are correct so you can process the file correctly.

This ensures a rigorous process that has errors limited.

How to compare the headers

The first step would be to load the data into a Pandas data frame:

import pandas as pd

df = pd.read_csv("csv_import.csv") #===> Include the headers
print(df)

The actual original file is as follows:

Next we need to make sure that we have a list that we can compare to:

header_list = ['Name','Address_1','Address_2','Address_3','Address_4','City','Country']

The next step will allow us to save the headers imported in the file to a variable:

import_headers = df.axes[1] #==> 1 is to identify columns
print(import_headers)

Note that the axis chosen was 1, and this is what Python recognises as the column axes.

Finally we will apply a loop as follows:

a = [i for i in import_headers if i not in header_list]
print(a)

In this loop, the variable “a” is taking the value “i” which represents each value in the import_headers variable and through a loop checks each one against the header_list to see if it is in it.

It then prints out the values not found.

Pulling this all together gives:

import pandas as pd

df = pd.read_csv("csv_import.csv") #===> Include the headers
print(df)

#Expected values to receive in CSV file
header_list = ['Name','Address_1','Address_2','Address_3','Address_4','City','Country']

import_headers = df.axes[1] #==> 1 is to identify columns
print(import_headers)


a = [i for i in import_headers if i not in header_list]
print(a)

Resulting in the following output:

As can be seen the addresses below where found not to be valid, as they where not contained within our check list “header_list”

TypeError: ‘list’ object is not an iterator

We have covered off many TypeErrors on this website, here we will go through which using a list with and it is not an iterator gives you errors.

In order to understand this error better, we need to first understand what is an iterator in Python?

An iterator is a Python object that has the following characteristics:

  • You can count the no of values that are contained within it.
  • It also can be iterated through, so you need to apply an iteration method to it.

How does this error occur?

Normally this error occurs when you try to iterate over a list, but you have not made the list iterable.

There are two things required to make this happen:

(A) The iter() returns an iterator object

(B) The next() method moves to the next value.

Without both the code will fail and the error you are about will occur!

In the below code we have a list:

a = ['q', 'w', 'e', 'r', 't', 'y']

with the following:

b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a)

As can be seen in the above code we have one component for the iteration , we expect two as per the above.

As a result we get the error:

Traceback (most recent call last):
  File "list_object_is_not_an_iterator.py", line 13, in <module>
    b = next(a)
TypeError: 'list' object is not an iterator

In order to fix this ,all we need to do is apply the iterator to the list as follows:

a = iter(['q', 'w', 'e', 'r', 't', 'y']) ====> We added in the iter() here, enclosing the list within it

b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a)
#b = next(a)


print(b)

Giving output:
y

As a result of this, we now have the two required methods that will not give this error.

What is going on within the iterator?

In the above code we have asked to print b. What the iterator is doing is going to the first value of b, in this case q and print.

But because we have a variable b on multiple lines, with the method “next()” in it, the logic is moving through each value of the list till it gets to the end.

What can be done though is , reduce the length of the returned b variables to print as follows:

a = iter(['q', 'w', 'e', 'r', 't', 'y'])
b = next(a)
print(b)
returns:
q

BUT
a = iter(['q', 'w', 'e', 'r', 't', 'y'])
b = next(a)
b = next(a)
print(b)
returns:
w

As can be seen it returns the next value in the list. You can keep adding the b variables.

What happens when you get to the end of the list?

So now we have the below, and we are returning the last value:

a = iter(['q', 'w', 'e', 'r', 't', 'y'])
b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a)

Returns:
y

The reason for this is that we have the required no of variables with the next method, which equals the length of the list.

If we add in one more b variable:

a = iter(['q', 'w', 'e', 'r', 't', 'y'])
b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a)
b = next(a) ===> Additional b variable

Returns: 
Traceback (most recent call last):
  File "list_object_is_not_an_iterator.py", line 19, in <module>
    b = next(a)
StopIteration

The purpose of StopIteration is to not allow a continuous loop and recognise that the end of the list has been reached.

Implementing Iterators

Iterators could be used in the following circumstances:

(A) You have a defined list of object values to work with.

(B) If sequence is important an iterator will help to process values in the order they appear in a list.

how do I declare a null value in python?

In writing about this question, it is important to understand a few things , before we step into the code.

Null values appear in almost every programming language, but they can have a different meaning.

A null value basically means it is empty. So when you return a value that is null, it has nothing in it.

Sometimes it can be confused with the value zero, but as zero is an actual integer, it is not an empty value in a field.

Python uses None to define the keyword null, that returns an empty value.

How can we show null in an output?

Lets look at the below output. From observation it can be seen that a,b,d retuns an int value, but that is quite straight forward.

Let’s focus on c. When it is printed out the value on the screen is showing nothing, and its data type is str. But why is that, surely it is None or empty as we were expecting?

Well Python will return as a string , unless it is explicitly declared as empty. The next section will take you through that.

a = 1
b = 1
c = ""
d = a-b
print(a)
print(b)
print(c)
print(d)
print(type(a))
print(type(b))
print(type(c))
print(type(d))

Returns:
1
1

0
<class 'int'>
<class 'int'>
<class 'str'>
<class 'int'>

So based on the above, how do I declare a null value in python?

We have modified the code above, and declared c as None, and in this instance the code now recognises the output as empty.

a = 1
b = 1
c = None
d = a-b
print(a)
print(b)
print(c)
print(d)
print(type(a))
print(type(b))
print(type(c))
print(type(d))

Result:
1
1
None
0
<class 'int'>
<class 'int'>
<class 'NoneType'>
<class 'int'>

What are the other scenarios that None will be returned?

Python also returns values on None, even though they have not been explicitly declared.

In the below if you try to print nothing, it will by default return an empty value.

On the other hand using an if statement you can check if a value is a None.

The final example is where you are using a function, to check if a value is in a list. If the value does not appear in the list it returns nothing.

This is especially handy , if you want to completely sure that that the returned values are nothing.

It gives you a level of comfort that that the code will not pass anything to other parts of the programme.

a = print()
print(a)

variable = None

if variable is None:
    print("Correct")
else:
    print("Incorrect")

variable1 = "today"
if variable1 is None:
    print("Correct")
else:
    print("Incorrect")


def returnnone():
    list = [1,2,3,4,5]
    for i in list:
        if i == 6:
            print("Six found")
        else:
            print(None)
            
returnnone()

Result:
None
Correct
Incorrect
None
None
None
None
None

Click here to get more information on other relevant data analytics posts.

how to compare two lists in Python

Estimated reading time: 2 minutes

Often you are going to be asked to compare lists, and you need a quick way to complete.

Here we are going to take through three ways to complete this, if you have more comment below.

Looping to find common values between lists

A simple loop can help you find data that is common to two lists:

# compare for similar values
list1 = ["1", "2", "3", "4"]
list2 = ["1", "2", "3", "4", "5"]

for i in list1:
    for j in list2:
        if i in j:
            print(i)

which yields:

1
2
3
4

Compare for an item in one list and not in the other

There maybe times you wish to find only the values that are in one list and not the other.

Below we use a one line piece of code using list comprehension, which does the same as a loop:

list1 = ["1", "2", "3", "4"]
list2 = ["1", "2", "3", "4", "5"]
for item in [x for x in list2 if x not in list1]:
    print(item)

which gives the result of:

5

comparing lists using the set method

The third way uses python sets, which essentially finds the intersection between two lists, like a Venn diagram.

Here we use set to find what values are not common to each list by using subtraction:

list1 = ["1", "2", "3", "4"]
list2 = ["1", "2", "3", "4", "5"]
a = set(list1)
b = set(list2)
c = b-a
print(c)

which gives you:

{'5'}

Alternatively you could find what is common to both:

list1 = ["1", "2", "3", "4"]
list2 = ["1", "2", "3", "4", "5"]
a = set(list1)
b = set(list2)
c = a.intersection(b)
print(c)

and your result will be:

{'1', '3', '4', '2'}

Remember that using sets will return them unordered, if you want them ordered then apply the following to the above code:

a = set(list1)
b = set(list2)
c = a.intersection(b)
d=sorted(c)
print(type(d))
print(d)

and the output will be:

<class 'list'>
['1', '2', '3', '4']

One thing to note that the sorted method above returns the set as a list not as a set.

There are plenty of resources online where you can learn about sets.

Python Tutorial: How to sort lists

Estimated reading time: 2 minutes

Following on from our post on how to use Python lists have you ever wondered how to sort lists for your Python project?

Our latest video on lists will go through some of the techniques available so that you can get an idea of how to structure your data and sort.

Getting to understand how to implement

In this latest video we will look at:

  • sort() method
  • sorted() function
  • sorting a list through a function

 

Adding in those extra bits to help make the process smoother

Have you thought about sorting ascending/descending?

  • There is also a discussion on this topic as well, and while an index is available for the list, which you may feel does not merit sorting, there could be other logical reasons to implement sorting.
  • Leaving out the reverse = True/False in the sorted method can have an impact, though if you require it left out of the list you have created, automatic ascending will be the default.

On this channel, we have discussed a number of different ways to manage your data. In thinking about sorting a list, why would you want to do this?

Some common reasons are:

  • To visually see if there are duplicates, either on the screen or printed out.
  • If other objects are dependant on the list, say a combo box, then having duplicates visible can help to reduce the size of their contents.
  • Iteration – If you are looking to iterate over a list, it will be quicker if it is sorted.

If you want to learn about lists, using them, and how how they can be iterated over, why not visit Data Analytics Ireland YouTube channel, there are lots of videos there that will help explain the concepts discussed here further.

To get some more links on this topic click here python sort method, it is a blog posting from our website that has some useful links and explanations for you.

YouTube channel lists – Python Lists

Estimated reading time: 2 minutes

Python lists are used extensively in projects, as a result it is important to understand their structure.

Some of the things they can be used for:

  1. Lookup values for comparisons.
  2. Passing data to them to store to be referenced elsewhere.
  3. As part of a loop, store values that have been found through the loop logic.

With methods are associated with lists?

  1. Append – Add values to the end of the list
  2. Extend – adds values from an iterable object to the end of the list.
  3. Insert – You can insert an item to a certain position in a list.
  4. Remove – Remove the first value in a list that has a value that was asked to be looked for.
  5. Pop – This also removes a value at a certain position and returns, consequently if no position is specified then it removes the last item and returns it.
  6. Clear – Removes all items from the list.
  7. Index – returns the index value of the first item found that was asked to be searched for.
  8. Count – returns the number of times an item that was searched or was found in a list.
  9. Sort – Sorts the items in the list.
  10. Reverse – This reverses the items in the list.
  11. Copy – This makes a copy of the list.

What are the properties of a list?

The data type has the following attributes, that make it really useful for a vast array of scenarios:

  • They are ordered – Whatever order the list is a unique characteristic of the list, furthermore changing the order makes it a different list.
  • You can use their index to access the value.
  • They are mutable, meaning you can apply any of the above methods on them.
  • They can contain strings, integers etc, accordingly, there is no restriction on what can be in the list.

Check out the below video playlist from our YouTube channel, they will help explain more about lists:

On this website you can also read about how to compare two lists in Python or how to sort lists using rstudio in addition to this blog post.

We hope you enjoy it!

Data Analytics Ireland