How to count the no of rows and columns in a CSV file

So you are working on a number of different data analytics projects, and as part of some of them, you are bringing data in from a CSV file.

One area you may want to look at is How to Compare Column Headers in CSV to a List in Python, but that could be coupled with this outputs of this post.

As part of the process if you are manipulating this data, you need to ensure that all of it was loaded without failure.

With this in mind, we will look to help you with a possible automation task to ensure that:

(A) All rows and columns are totalled on loading of a CSV file.

(B) As part of the process, if the same dataset is exported, the total on the export can be counted.

(C) This ensures that all the required table rows and columns are always available.

Python Code that will help you with this

So in the below code, there are a number of things to look at.

Lets look at the CSV file we will read in:

In total there are ten rows with data. The top row is not included in the count as it is deemed a header row. There are also seven columns.

This first bit just reads in the data, and it automatically skips the header row.

import pandas as pd

df = pd.read_csv("csv_import.csv") #===> reads in all the rows, but skips the first one as it is a header.


Output with first line used:
Number of Rows: 10
Number of Columns: 7

Next it creates two variables that count the no of rows and columns and prints them out.

Note it used the df.axes to tell python to not look at the individual cells.

total_rows=len(df.axes[0]) #===> Axes of 0 is for a row
total_cols=len(df.axes[1]) #===> Axes of 1 is for a column
print("Number of Rows: "+str(total_rows))
print("Number of Columns: "+str(total_cols))

And bringing it all together

import pandas as pd

df = pd.read_csv("csv_import.csv") #===> reads in all the rows, but skips the first one as it is a header.

total_rows=len(df.axes[0]) #===> Axes of 0 is for a row
total_cols=len(df.axes[1]) #===> Axes of 0 is for a column
print("Number of Rows: "+str(total_rows))
print("Number of Columns: "+str(total_cols))

Output:
Number of Rows: 10
Number of Columns: 7

In summary, this would be very useful if you are trying to reduce the amount of manual effort in checking the population of a file.

As a result it would help with:

(A) Scripts that process data doesn’t remove rows or columns unnecessarily.

(B) Batch runs who know the size of a dataset in advance of processing can make sure they have the data they need.

(C) Control logs – databases can store this data to show that what was processed is correct.

(D) Where an automated run has to be paused, this can help with identifying the problem and then quickly fixing.

(E) Finally if you are receiving agreed data from a third party it can be used to alert them of too much or too little information was received.

Here is another post you should read!

How to change the headers on a CSV file

TypeError: type object is not subscriptable

Estimated reading time: 2 minutes

I was recently working on our last blog post how to reverse a string in python and I came across this error.

The thought passed me what does it mean and how can I fix it?

So what does the error actually mean?

Essentially it means that , you are trying to access an object type, that has a property of “type”.

What are typical property of types? Well they can be:

  • int()
  • str()
  • tuple()
  • dict()

The above allows you to change your data to these data types, so the data contained within them can be further manipulated.

In essence you are trying to call type in the wrong way and in the wrong place in your code.

By calling it , it will throw this error, and they should be avoided, as they are a built in function.

Lets take an example of how we can replicate this error and fix it

name1 = "joe" # These have index values of [0,1,2]
emptylist =[]
strlength = len(name1) # Returns length of three
while strlength > 0:
    emptylist += str[strlength - 1] #This is the last index value of the variable "name1"
    strlength = strlength - 1
print(emptylist)

In the above code all appears well, but in line 5 the “str” before the [ is the problem. The code automatically looks to call this function.

The simple answer to fixing this is to rename it to name1 as follows:

name1 = "joe" # These have index values of [0,1,2]
emptylist =[]
strlength = len(name1) # Returns length of three
while strlength > 0:
    emptylist += name1[strlength - 1] #This is the last index value of the variable "name1"
    strlength = strlength - 1
print(emptylist)

which gives you the following error free output:

Result with no error: ['e', 'o', 'j']

In summary and what not to do

So it is clear that referencing types as a string variable should be avoided, and keep your code clean from this perspective.

This would also apply to any reserved words in Python as well.

Have you seen type error list object is not an iterator?

Also you may have come across type error float object is not callable