IndexError: index 2 is out of bounds for axis 0 with size 2

In Python you may come across IndexError: index 2 is out of bounds for axis 0 with size 2 from time to time.

The error relates to the use of indexes and the no of columns you are referencing.

We discussed single positional indexer is out-of-bounds and in that blog post it was related to referencing indexes for rows that do not exist.

In this blog post the error relates to referencing index for columns that do not exist.

Lets walk through the code below and where the problem arises

So lets refresh ourselves with the data that is in the excel CSV file below:

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[4][2] #===>this allows you to print a particular row or value in that row

print(df)
print(a)

Output:
Traceback (most recent call last):
  File "C:/Users/haugh/OneDrive/dataanalyticsireland/YOUTUBE/IndexError_index_2_is_out_of_bounds_for_axis_0_with_size_2/index_2_out_of_bounds_for_axis_0_with_size_2.py", line 7, in <module>
    a = df.iloc[4][2] #===>this allows you to print a particular row or value in that row
  File "C:\Users\haugh\anaconda3\lib\site-packages\pandas\core\series.py", line 879, in __getitem__
    return self._values[key]
IndexError: index 2 is out of bounds for axis 0 with size 2

As you can see the column index value can be either 0 or 1 , this is because they represent the index values of the columns starting at 0 and ending at 1.

The problem arises here in this line ===> a = df.iloc[4][2] . Essentially the value is looking to reference a column with index value 2, but as we saw above that index value does not exist.

As a result by replacing it with a column index value within the proper range ( in this case 0 or 1), the error will disappear and the code will work as expected.

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[4][0] #===>this allows you to print a particular row or value in that row

print(df)
print(a)

Gives:
       Name  Age
0       Joe   21
1      John   22
2       Jim   23
3      Jane   24
4  Jennifer   25
Jennifer


OR

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[4][1] #===>this allows you to print a particular row or value in that row

print(df)
print(a)

Gives:
      Name  Age
0       Joe   21
1      John   22
2       Jim   23
3      Jane   24
4  Jennifer   25
25

Process finished with exit code -1073741819 (0xC0000005)

Finally, in a = df.iloc[4][1] you can also change the value 4, which is the index for that row to either 0,1,2,3 and the code will work with no errors as it brings back the values that you expect.

So in summary:

(A) This error happens when you try to use a column index value does not exist as it is outside the index values for the columns that are in the data set.

(B) If this error does occur always check the expected index values for each column and compare against what you are trying to return.

IndexError: single positional indexer is out-of-bounds

Often you will get an error IndexError: single positional indexer is out-of-bounds that is referencing a row that that does not exist based on its index value.

When you want to look at a particular row in Python, there is a way that you can reference the row and then the values within it.

Lets break it down further to understand how the error occurs and why and how to fix it.

How the error occurs?

When we look at the below code, it throws out the error we are trying to fix.

Digging deeper lets look at the file we are importing, and the values contained within them. From the CSV file:

the above values are imported. If we where to create a matrix of its index values, it would be as follows:

As can be seen already, the index values range from zero to four in both row values and the column values are an index value of 1.

In the below code though we are trying to reference a row index value of five, but that does not exist, hence the error.

Note that using “iloc” allows you to return all the row values or a particular row and column value, we will demonstrate that in the next section.

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[5] #===>this allows you to print a particular row or value in that row

print(df)

Error:
IndexError: single positional indexer is out-of-bounds

How to fix this error?

First off lets just return the whole row say of index value two based on the below matrix:

This should return Jim and 23 in the output

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[2] #===>this allows you to print a particular row or value in that row

print(df)
print(a)

Output:

       Name  Age
0       Joe   21
1      John   22
2       Jim   23
3      Jane   24
4  Jennifer   25
Name    Jim
Age      23
Name: 2, dtype: object

Process finished with exit code -1073741819 (0xC0000005)

We could also return either a name or age value as well, as long as they are within the range of values. This is achieved as follows:

Lets return just Jennifer’s age of 25 as follows:

import pandas as pd

dataset = pd.read_csv('import_file.csv', sep = ',')

df = pd.DataFrame(dataset, columns=['Name','Age'])

a = df.iloc[4][1] #===>this allows you to print a particular row or value in that row

print(df)
print(a)

Output:
      Name  Age
0       Joe   21
1      John   22
2       Jim   23
3      Jane   24
4  Jennifer   25
25

Process finished with exit code -1073741819 (0xC0000005)

So in summary:

(A) When you are looking to retrieve particular values in a row, you need to make sure you have a valid range of index values.

(B) Using “iloc” is a handy way to retrieve any value you want, but make sure you reference the correct index values.

IndexError: list index out of range

Estimated reading time: 3 minutes

Are you working with lists and getting the error IndexError: list index out of range while using Python? There is a very simple explanation to this, and its fix is very easy.

First of all lets understand what is going on with the list.

Traceback (most recent call last):
  File "C:/Users/haugh/OneDrive/dataanalyticsireland/YOUTUBE/IndexError_list_index_out_of_range/INDEX_ERROR_LIST_INDEX_OUT_OF_RANGE.py", line 4, in <module>
    print(data[4])
IndexError: list index out of range

Lists and their index values

In the below list, we have outputted its values and index values.

data = ['a','b','c','d']
for (i,item) in enumerate(data, start=0): #===> Loops through list and applies index values starting at zero
    print(i,item)

Output:
0 a
1 b
2 c
3 d

Process finished with exit code 0

As can be seen the program returns the list values and their indexes. Note that the index starts at zero as we have set start=0.

Start=0 can be set to any value you like, as can be seen here:

data = ['a','b','c','d']
for (i,item) in enumerate(data, start=1): #===> Loops through list and applies index values starting at zero
    print(i,item)

Output:
1 a
2 b
3 c
4 d

Process finished with exit code 0

OR

data = ['a','b','c','d']
for (i,item) in enumerate(data, start=22): #===> Loops through list and applies index values starting at zero
    print(i,item)

Output:
22 a
23 b
24 c
25 d

Process finished with exit code 0

The purpose of the index value is to tell the program where to start its index from, if left empty it starts at zero.

Lists and the no of index values

In the above examples the index values all occur on four rows.

This is important as when you are looping through the rows, it will not go beyond the length of the rows.

So in this example the enumerate function specifically counts the no rows and stores the index values with each, and then loops through the list till it hits the last one, without error.

data = ['a','b','c','d']
for (i,item) in enumerate(data, start=0): #===> Loops through list and applies index values starting at zero
    print(i,item)

Output:
0 a
1 b
2 c
3 d

Process finished with exit code 0

How to fix the error IndexError: list index out of range

So the reason we get the below is that the line print(data[4]) is looking for the row with index value 4, but we know that from observation that does not exist.

To fix this we would change the value 4 in print(data[4]) to any of 0,1,2,3, as they are the index values associated with the list.

data = ['a','b','c','d']
for (i,item) in enumerate(data, start=0): #===> Loops through list and applies index values starting at zero
    print(i,item)
print(data[4])

Output:
Traceback (most recent call last):
  File "C:/Users/haugh/OneDrive/dataanalyticsireland/YOUTUBE/IndexError_list_index_out_of_range/INDEX_ERROR_LIST_INDEX_OUT_OF_RANGE.py", line 4, in <module>
    print(data[4])
IndexError: list index out of range
0 a
1 b
2 c
3 d

Applying a correct valid index value:
data = ['a','b','c','d']
for (i,item) in enumerate(data, start=0): #===> Loops through list and applies index values starting at zero
    print(i,item)
print(data[3])

Yields with no error:
0 a
1 b
2 c
3 d
d

So in summary when working with lists and their index values it is important:

(A) Understand the length of your list.

(B) Where your index values start and finish.

This error is easily fixable, but in your code you just need to make sure that you referencing values that are in the range of your index values.

ValueError: pattern contains no capture groups

Estimated reading time: 2 minutes

In Python, there are a number of re-occurring value errors that you will come across.

In this particular error it is usually related to when you are running regular expressions as part of a pattern search.

So how does the problem occur?

In the below, the aim of the code is to purely create a data frame, that can then be searchable.

To search the data frame we will use str.extract

import pandas as pd
rawdata = [['Joe', 'Jim'],
           ['Jane', 'Jennifer'],
           ['Ann','Alison']]
datavalue = pd.DataFrame(data=rawdata, columns=['A', 'B'])

We then add the below code to complete the extract of the string “Joe”.

a = datavalue['A'].str.extract('Joe')
print(a)

But it gives the below error, what we are trying to solve for:

ValueError: pattern contains no capture groups

Process finished with exit code 1

But why did the error occur , and how can we fix it?

In essence when you try to complete a str.extract, the value you are looking for should be enclosed in brackets i.e ()

In the above, it views ‘Joe’ as an incorrect value to be passed into the str.extract function, and returns the error.

So to fix this problem, we would change this line to:

a = datavalue['A'].str.extract('(Joe)')

As a result the program runs without error, and returns the below result:

     0
0  Joe
1  NaN
2  NaN

The full corrected code to be used is then:

import pandas as pd
rawdata = [['Joe', 'Jim'],
           ['Jane', 'Jennifer'],
           ['Ann','Alison']]
datavalue = pd.DataFrame(data=rawdata, columns=['A', 'B'])

a = datavalue['A'].str.extract('(Joe)')
print(a)

String Manipulation in Python

Estimated reading time: 2 minutes

Are you working with strings and need to quickly alter them so they look correct? We are going to take you through the following manipulations so you can quickly upskill on how to better manage them.

Python offers some very easy to use methods, which make the process of getting what you want the data to look like easier.

Find the length of a string

# Find the length of a string
text = "Fetchme"
print("Length is:", len(text))

result is: ===> Length is: 7

How to split a string variable – using one split value

text = "Hello,what is your name."
splittext = text.split(",") ==> One split value assigned.
print(splittext)

result is: ===> ['Hello', 'what is your name.']

How to split a string variable – use more than one split value

text = "Hello,what is your name;My name is joe;test"
print(re.split(r'[,.;]', text)) ==> Notice that what you want to split on is between the [] brackets.

result is: ===> ['Hello', 'what is your name', 'My name is joe', 'test']

Find any character in a string

text = "Hello,what is your name."
print("First character is:", text[0])
print("Fifth character is:", text[5])
print("Sixth character is:", text[6])

result:
First character is: H
Fifth character is: ,
Sixth character is: w

Print a string in an upper or lower case

text = "Joe"
print("Upper case:", str.upper(text)) #upper case
print("Lower case:",str.lower(text)) #lower case

result:
Upper case: JOE
Lower case: joe

Concatenation of a string

first = "rainy"
last = "day"
name = first + last
print(name)

the result is: rainyday

Testing a string value returns a Boolean value

testword = "abc123XSWb"
digits = "123"
print(testword.isalnum()) #check if all characters are alphanumeric
print(testword.isalpha()) #check if all characters in the string are alphabetic
print(digits.isdigit()) #test if string contains digits only
print(testword.istitle()) #test if string contains title words
print(testword.isupper()) #test if string contains upper case
print(testword.islower()) #test if string contains lower case
print(testword.isspace()) #test if string contains spaces
print(testword.endswith('b')) #test if string endswith a b
print(testword.startswith('H')) #test if string startswith H

result:
True
False
True
False
False
False
False
True
False

How do I fix TypeError: unhashable type: ‘list’ Error?

Estimated reading time: 3 minutes

When programming in python you will come across this error quite often, in this case is quite easily fixed once understood.

The problem usually arises when you try to loop through a dictionary with key-value pairs. If you are unsure what a dictionary looks like see W3 Schools

Lets examine those loops that don’t throw the error.

Using a list, produces the following output with no error:

list = [['a'],['b'],['c'],['d'],['e'],['f']]

print(type(list))

for i in list:
    print(i)

<class 'list'>
['a']
['b']
['c']
['d']
['e']
['f']

If there is a need for a tuple, then the following outputs with no error:

list = (['a'],['b'],['c'],['d'],['e'],['f'])

print(type(list))

for i in list:
    print(i)

<class 'tuple'>
['a']
['b']
['c']
['d']
['e']
['f']

Using a dictionary, it gives the error you are looking to resolve, but why?

list = {['a'],['b'],['c'],['d'],['e'],['f']}

print(type(list))

for i in list:
    print(i)

list = {['a'],['b'],['c'],['d'],['e'],['f']}
TypeError: unhashable type: 'list'

To understand the error it is important to step back and figure out what is going on in each scenario:

(A) Looping through the list, it looks at the values on their own, thus the loop completes with no problem.

(B) As with lists, Tuples are immutable ( cannot be modified), more importantly, they can be looped through with no error.

In this case the lists have single values, the dictionary above has only one value, it expects two, hence the error.

How do we fix this error going forward?

The simplest way is to loop through a list of single items with the iterable code below:

fixlist = [['a'],['b'],['c'],['d'],['e'],['f'],['f'],['c']]

# Converts fixlist from a list of lists to a flat list, and removes duplicates with set
fixlist  = list(set(list(itertools.chain.from_iterable(fixlist))))

print(fixlist)
Result : ['d', 'f', 'c', 'b', 'a', 'e']

Now your code is only looking to loop through some single values within your list, compared to dictionary key-value pairs.

Approaching solving this problem through an iteration line by line helped to pinpoint the problem.

Consequently the steps I went through to fix the problem involved:

(A) print(type(variable)) – Use this on passing data to see what the data types are, clarifies if this is the problem.

(B) Consequently once the line of code that was throwing the error was found, removing the dictionary fixed the problem.

Or

If a dictionary is required to be looped through, it needs the proper key, value pairs setup.

Conclusion

In conclusion, in order to remove this error it is important to identify the line and or lines, that have a dictionary and covert them to a list

or

if a dictionary is needed ensure that the lists are converted to a dictionary with key, value pairs.

If you would like to see a very good video explanation of this error head over to Brandon Jacobson’s YouTube channel , and make sure to subscribe.

His explantion is below:

python classes

Estimated reading time: 4 minutes

Recently on our Data Analytics Ireland YouTube channel, we have been working hard to enhance our video content and delivery. As part of that process, we also looked at ways to understand classes and use them more efficiently.

In how to create a class in python we provided a video tutorial of the steps involved in Python on how to implement a class within your project.

Here in this blog posting, we will go through the different aspects of classes, and provide a practical example of Object-Oriented Programming, and how it can help you to manage and reduce the code you may have to write.

Before we start, the first question we should ask is, what is a class?

According to the Official Python Website  “Classes provide a means of bundling data and functionality together”.

So in essence, what they are really saying is that they are used to centralise information and functionality around a python object.

Python Objects

So how would we describe an object? An object is anything that can have attributes attached to it and have some functionality that allows the object to function. Most objects will have methods associated with them, and these are the functionality of the object. So let’s step back a second and show this in a piece of code:

“Car ” above is the object, and it has attributes of type, color, wheels, and doors. Other attributes can be added at any point. So you may add why is start structuring like this important?

The pure and simple answer is organisation!

The reason behind this way is that as the car and its details are all in one place it encourages:

  • Consistency – everything about a car is documented in one place.
  • No duplication – If you were referencing the object car in a number of places in your code, for each update, you would have to change it in each place, it makes the updates long and harder as you have to remember where you put it in the different parts of your computer program.
  • Can be called from anywhere – As we have one version of the object car, now anywhere in our code we can call it and use its attributes, as there is only one version makes the program a lot easier to manage.

Methods and Functions

Now that we have looked at attributes, what about the methods and functions that can be contained within them?

The methods and functions will operate like any other method or function, but the difference when they come to objects is:

They are specific to that object!

python classes

In the above code, you will see that there are three methods, and all are specific to the object car.

For example, you would not expect to see any methods that would relate to

  • pumping up the wheels.

         or

  • changing a bulb.

purely because this object car is only concerned with the functionality of the car.

To build on my point above around no duplication, if this object was not created, this last piece of code:

  • might have to be maintained and duplicated a number of times within your code, this is where the classes come into their own.

So say I want to use the class and its attributes, how would I go about doing that?

In section  9.3.5. Class and Instance Variables on Python classes, it states the following:

Instance variables are for data unique to each instance and class variables are for attributes and methods shared by all instances of the class:

Anywhere in your code, all you need to do is create a variable and equal it to the class, see below.

Lines 22,23,24, all can now use the methods that are in the class, and their respective functionality.

As can be seen, lines 25,26,27,28 all bring in the attributes of the class to be used, by putting them in under the “def__init__” method.

And here is the output of the above. After initiating the class, we have assigned its attributes into a new variable.

And there you go we have initialised the class Car and used its attributes and methods outside the class in our regular programming,

Consequently, this could be done anywhere in our program, multiple times, but only having to use one class.

You can see this working below:

python sort method

Estimated reading time: 2 minutes

Why would you sort a list?

It allows efficiency in other algorithms to quickly find data in the list that is used as an input to their code, examples include searching and merging data.

Also, can be used to standardize the data set so that it can have a meaningful representation.

For data visualization purposes having it in order can allow the viewer quickly to attach meaning to what they see in front of them.

There are different sorting techniques as follows:

  • Bubble Sort Algorithm is used to arrange N elements in ascending order.
  • Selection sort is a straightforward process of sorting values. In this method, you sort the data in ascending order.
  • Merge sort splits two lists into a comparable size, sorts them, and then merges them back together.

According to the  Python Organisation website, Python lists have a built-in list.sort() the method that modifies the list in-place.

mylist = [5, 2, 3, 1, 4]
mylist.sort()
print(mylist)
[1, 2, 3, 4, 5]

This method only works for lists.

It also has a very similar method sorted() , which, unlike list.sort, can work on any iterable.

a= {'c':'1','b':'2','a':'3'}

print(sorted(a))
['a', 'b', 'c']

Note that the sorted method only sorts the key value in the dictionary above.

Per programiz.com parameters for the sorted() function are as follows:

sorted() can take a maximum of three parameters:

  • iterable – A sequence (stringtuplelist) or collection (setdictionaryfrozen set) or any other iterator.
  • reverse (Optional) – If, the sorted list is reversed (or sorted in descending order). Defaults to if not provided.
  • key (Optional) – A function that serves as a key for the sort comparison. Defaults to None.

Click how to sort lists in python to get a video tutorial on the above, which may help to explain the concepts further.

Tkinter GUI tutorial python – how to clean excel data

Estimated reading time: 2 minutes

Tkinter is an application within Python that allows users to create GUI or graphical user interfaces to manage data in a more user-friendly way.

We are building our data analytics capability here, and looking to provide the user with the functionality they use in their work or college projects.

We have tested this code over 100,000 records sitting on the Microsoft OneDrive network so in a way, for this reason, its speeds were quite good.

As a result over five tests, they all were under 100s from start to finish.

data cleansing data cleansing fixed

In this Tkinter GUI tutorial python, you will be shown how to find the data errors, clean them and then export the final result to excel.

We will take you through the following:

  • Creation of the Tkinter interface.
  • Methods/ functions to find errors.
  • Methods/functions to clean the data.
  • Exporting the clean data to an excel file.

 

To sum up:

The video walks through the creation of a Tkinter window using a canvas and a frame to store the data frame.

Then it looks at importing the data through pd.read_excel, to load the data into a pandas data frame.

Next, there is a function and or method that will extract the errors through str.extract , which is loaded into separate columns

Finally, I have exported the clean dataset using rawdata.to_excel , and saved the file as a separate new spreadsheet.

planning your machine learning model

Estimated reading time: 1 minute

Planning your machine learning model is one of the most important steps you will take in order to achieve the best results you are looking for.

In looking at how to plan a machine learning project, this video takes you through 3 steps:

a. Researching

b. Building your model

c. Testing your model

❤Subscribe for more free YouTube tips: Subscribe to Data analytics Ireland

❤Share this video with a YouTuber friend: Planning your machine learning model