In Python you may come across IndexError: index 2 is out of bounds for axis 0 with size 2 from time to time.
The error relates to the use of indexes and the no of columns you are referencing.
We discussed single positional indexer is out-of-bounds and in that blog post, it was related to referencing indexes for rows that do not exist.
In this blog post, the error relates to referencing the index for columns that do not exist.
Let’s walk through the code below and where the problem arises
So let us refresh ourselves with the data that is in the excel CSV file below:
import pandas as pd
dataset = pd.read_csv('import_file.csv', sep = ',')
df = pd.DataFrame(dataset, columns=['Name','Age'])
a = df.iloc[4][2] #===>;this allows you to print a particular row or value in that row
print(df)
print(a)
Output:
Traceback (most recent call last):
File "C:/Users/haugh/OneDrive/dataanalyticsireland/YOUTUBE/IndexError_index_2_is_out_of_bounds_for_axis_0_with_size_2/index_2_out_of_bounds_for_axis_0_with_size_2.py", line 7, in module;
a = df.iloc[4][2] #===>;this allows you to print a particular row or value in that row
File "C:\Users\haugh\anaconda3\lib\site-packages\pandas\core\series.py", line 879, in __getitem__
return self._values[key]
IndexError: index 2 is out of bounds for axis 0 with size 2
As you can see the column index value can be either 0 or 1, this is because they represent the index values of the columns starting at 0 and ending at 1.
The problem arises here in this line ===> a = df.iloc[4][2] . Essentially the value is looking to reference a column with index value 2, but as we saw above that index value does not exist.
As a result by replacing it with a column index value within the proper range ( in this case 0 or 1), the error will disappear and the code will work as expected.
import pandas as pd
dataset = pd.read_csv('import_file.csv', sep = ',')
df = pd.DataFrame(dataset, columns=['Name','Age'])
a = df.iloc[4][0] #===>;this allows you to print a particular row or value in that row
print(df)
print(a)
Gives:
Name Age
0 Joe 21
1 John 22
2 Jim 23
3 Jane 24
4 Jennifer 25
Jennifer
OR
import pandas as pd
dataset = pd.read_csv('import_file.csv', sep = ',')
df = pd.DataFrame(dataset, columns=['Name','Age'])
a = df.iloc[4][1] #===>;this allows you to print a particular row or value in that row
print(df)
print(a)
Gives:
Name Age
0 Joe 21
1 John 22
2 Jim 23
3 Jane 24
4 Jennifer 25
25
Process finished with exit code -1073741819 (0xC0000005)
Finally, in a = df.iloc[4][1] you can also change the value 4, which is the index for that row to either 0,1,2,3 and the code will work with no errors as it brings back the values that you expect.
So in summary:
(A) This error happens when you try to use a column index value does not exist as it is outside the index values for the columns that are in the data set.
(B) If this error does occur always check the expected index values for each column and compare them against what you are trying to return.