ValueError: cannot convert float NaN to integer

Estimated reading time: 2 minutes

Sometimes in your data analytics project, you will be working with float data types and integers, but the value NaN may also appear, which will give you headaches if you don’t know how to fix the problem at hand.

A NaN is defined as “Not a Number” and represents missing values in the data. If you are familiar with SQL, it is a similar concept to NULLS.

So how does this error occur in Python?

Let’s look at some logic below:

NaN =float('NaN')
print(type(NaN))
print(NaN)

Result:
<class 'float'>
nan

As can be seen, we have a variable called ‘NaN’, and it is a data type ‘Float’

One of the characteristics of NaN is that it is a special floating-point value and cannot be converted to any other type than float; thus, when you look at the example below, it shows us this exactly and why you would get the error message we are trying to solve.

NaN =float('NaN')
print(type(NaN))
print(NaN)

a= int(NaN)

print(a)

Result:

Traceback (most recent call last):
  File "ValueError_cannot_convert_float_NaN_to_integer.py", line 5, in <module>
    a= int(NaN)
ValueError: cannot convert float NaN to integer

In the variable ‘a’ we are trying to make that an integer number from the NaN variable, which, as we know, is a floating-point value and cannot be converted to any other type than float.

How do we fix this problem?

The easiest way to fix this is to change the ‘NaN’ actual value to an integer as per the below:

NaN =float(1)
print(type(NaN))
print(NaN)

a= int(NaN)

print(a)
print(type(a))

Result:
<class 'float'>
1.0
1
<class 'int'>

So, in summary, if you come across this error:

  1. Check to see if you have any ‘Nan’ values in your data.
  2. If you do replace them with an integer value, or a value that you need for your project, that should solve your problem.

ValueError: pattern contains no capture groups

Estimated reading time: 2 minutes

In Python, there are a number of re-occurring value errors that you will come across.

In this particular error it is usually related to when you are running regular expressions as part of a pattern search.

So how does the problem occur?

In the below, the aim of the code is to purely create a data frame, that can then be searchable.

To search the data frame we will use str.extract

import pandas as pd
rawdata = [['Joe', 'Jim'],
           ['Jane', 'Jennifer'],
           ['Ann','Alison']]
datavalue = pd.DataFrame(data=rawdata, columns=['A', 'B'])

We then add the below code to complete the extract of the string “Joe”.

a = datavalue['A'].str.extract('Joe')
print(a)

But it gives the below error, what we are trying to solve for:

ValueError: pattern contains no capture groups
Process finished with exit code 1

But why did the error occur , and how can we fix it?

In essence when you try to complete a str.extract, the value you are looking for should be enclosed in brackets i.e ()

In the above, it views ‘Joe’ as an incorrect value to be passed into the str.extract function, and returns the error.

So to fix this problem, we would change this line to:

a = datavalue['A'].str.extract('(Joe)')

As a result the program runs without error, and returns the below result:

     0
0  Joe
1  NaN
2  NaN

The full corrected code to be used is then:

import pandas as pd
rawdata = [['Joe', 'Jim'],
           ['Jane', 'Jennifer'],
           ['Ann','Alison']]
datavalue = pd.DataFrame(data=rawdata, columns=['A', 'B'])
a = datavalue['A'].str.extract('(Joe)')
print(a)

ValueError: invalid literal for int() with base 10

Have you been working with the data type int, and getting the above error?

As a data analytics practitioner, this problem pops up quite frequently, so it is time to get to the bottom of it!

Here we explain the data type, why the error occurs and examples of how to fix it.

So what are INT and what are their properties?

INT or integers are whole numbers with are positive or negative. They cannot hold any decimal places.

Also INT can return an integer object from a number or string.

Any value that has a decimal place in it will be rounded down, and anything after the decimal place will be ignored.

For the below you can see the output is rounded down:

data_error = int(1.75)
print(data_error)
print (type(data_error))
output ===> 1
<class 'int'>

How can string inputs be managed?

Strings are read in, and the program will handle the string as an int, and then perfom its checks.

For example:

data_error = int('1')
print(data_error)
print (type(data_error))
Gives output of: 1
<class 'int'>

BUT

data_error = int('1.75')
print(data_error)
print (type(data_error))
Gives output ====> ValueError: invalid literal for int() with base 10: '1.75'

This is exactly the problem we are looking to resolve. When you read the value as an integer without any inverted commas (i.e. 1.75), it treats it as an integer.

On the other hand, when you include the inverted commas ( i.e. ‘1.75’), the program cannot manage the conversion from string and throws the error above.

For reference Base 10 represents all the integers from 0-9, which are whole numbers, in essence, ten possibilities.

It is used to assign numbers to an integer or string passed to an integer, so for example 3569 is:

3 has a place of 3000

5 has a place of 500

6 has a place of 60

and 9 has a place of 9.

Click here to see a full explanation of Base 10 , the error is used to signify that the output does not align with the Base 10 system.

How can I fix this problem when I encounter it in a program?

Returning to this problem below, the solutions that could be applied are as follows:

data_error = int('1.75')
print(data_error)
print (type(data_error))
Gives output ====> ValueError: invalid literal for int() with base 10: '1.75'

Fix1: Remove the inverted commas, giving you:

data_error = int(1.75)
print(data_error)
print (type(data_error))
Output ====> 1
<class 'int'>

Fix2: Only include integer values when passing to another object.

This will help with ensuring the controls are in place to reduce these errors occurring.

Also if you have to use values inside inverted commas , make sure they are of an integer type only, with no decimal places.

To wrap with conclusions

In summary, the best way to avoid this problem is to use integer values at all times, with no decimal places.

If you have to use a value with a decimal place then do not include it inside inverted commas so the computer program can read it correctly, drop everything after the decimal point and just return an integer.

On our YouTube channel you can subscribe to find out more information about ValueErrors and many more tips and tricks to help you with better data analytics.

We have created a video for this blog post, have a look at it, and it you like the channel please subscribe!

We have also got a fix here for value error pattern contains no capture groups