IndexError: index 2 is out of bounds for axis 0 with size 2

In Python you may come across IndexError: index 2 is out of bounds for axis 0 with size 2 from time to time.

The error relates to the use of indexes and the no of columns you are referencing.

We discussed single positional indexer is out-of-bounds and in that blog post it was related to referencing indexes for rows that do not exist.

In this blog post the error relates to referencing index for columns that do not exist.

Lets walk through the code below and where the problem arises

So lets refresh ourselves with the data that is in the excel CSV file below:

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[4][2] #===>this allows you to print a particular row or value in that row

print(df)
print(a)

Output:
Traceback (most recent call last):
  File "C:/Users/haugh/OneDrive/dataanalyticsireland/YOUTUBE/IndexError_index_2_is_out_of_bounds_for_axis_0_with_size_2/index_2_out_of_bounds_for_axis_0_with_size_2.py", line 7, in <module>
    a = df.iloc[4][2] #===>this allows you to print a particular row or value in that row
  File "C:\Users\haugh\anaconda3\lib\site-packages\pandas\core\series.py", line 879, in __getitem__
    return self._values[key]
IndexError: index 2 is out of bounds for axis 0 with size 2

As you can see the column index value can be either 0 or 1 , this is because they represent the index values of the columns starting at 0 and ending at 1.

The problem arises here in this line ===> a = df.iloc[4][2] . Essentially the value is looking to reference a column with index value 2, but as we saw above that index value does not exist.

As a result by replacing it with a column index value within the proper range ( in this case 0 or 1), the error will disappear and the code will work as expected.

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[4][0] #===>this allows you to print a particular row or value in that row

print(df)
print(a)

Gives:
       Name  Age
0       Joe   21
1      John   22
2       Jim   23
3      Jane   24
4  Jennifer   25
Jennifer


OR

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[4][1] #===>this allows you to print a particular row or value in that row

print(df)
print(a)

Gives:
      Name  Age
0       Joe   21
1      John   22
2       Jim   23
3      Jane   24
4  Jennifer   25
25

Process finished with exit code -1073741819 (0xC0000005)

Finally, in a = df.iloc[4][1] you can also change the value 4, which is the index for that row to either 0,1,2,3 and the code will work with no errors as it brings back the values that you expect.

So in summary:

(A) This error happens when you try to use a column index value does not exist as it is outside the index values for the columns that are in the data set.

(B) If this error does occur always check the expected index values for each column and compare against what you are trying to return.

Python tutorial: Pandas groupby ( Video 1)

In this first video about pandas groupby and as part of expanding the data analytics information of this website, we are looking to explain how you can use a groupby selection to sort your data into similar datasets better so they can be better analysed. In the video below, we import our data into a dataframe, and then group as follows:

  • Directly naming the column
  • Through get_group
  • Using a loop
  • Utilising a lambda function