ValueError: pattern contains no capture groups

Estimated reading time: 2 minutes

In Python, there are a number of re-occurring value errors that you will come across.

In this particular error it is usually related to when you are running regular expressions as part of a pattern search.

So how does the problem occur?

In the below, the aim of the code is to purely create a data frame, that can then be searchable.

To search the data frame we will use str.extract

import pandas as pd
rawdata = [['Joe', 'Jim'],
           ['Jane', 'Jennifer'],
           ['Ann','Alison']]
datavalue = pd.DataFrame(data=rawdata, columns=['A', 'B'])

We then add the below code to complete the extract of the string “Joe”.

a = datavalue['A'].str.extract('Joe')
print(a)

But it gives the below error, what we are trying to solve for:

ValueError: pattern contains no capture groups

Process finished with exit code 1

But why did the error occur , and how can we fix it?

In essence when you try to complete a str.extract, the value you are looking for should be enclosed in brackets i.e ()

In the above, it views ‘Joe’ as an incorrect value to be passed into the str.extract function, and returns the error.

So to fix this problem, we would change this line to:

a = datavalue['A'].str.extract('(Joe)')

As a result the program runs without error, and returns the below result:

     0
0  Joe
1  NaN
2  NaN

The full corrected code to be used is then:

import pandas as pd
rawdata = [['Joe', 'Jim'],
           ['Jane', 'Jennifer'],
           ['Ann','Alison']]
datavalue = pd.DataFrame(data=rawdata, columns=['A', 'B'])

a = datavalue['A'].str.extract('(Joe)')
print(a)

hide a column from a data frame

Estimated reading time: 2 minutes

They say there is nowhere to hide, we disagree!
As an addition to How to add a column to a dataframe would you like to learn to go and hide it?! This video has several steps in it; following each one will give you a good introduction.

To start why you would like to hide a column?

  • You may not want to reveal its output as it is sensitive information.
  • The data in the column is not in the correct format, you will want to repurpose it, so it is the way you want it.
  •  The column could be a calculated column. Hence it serves as an intermediary step before your data frame is output.

Finding the best way to hide unwanted data:

In this video, we introduce several concepts to help not show a column:

  • Specify the actual columns you want to include in the data frame, by default doing this you are excluding the column or columns you don’t want to see.
  •  We use drop, to explicitly tell the data frame not to show a particular column.
  •  Also, we display a scenario whereby you have a calculated column but do not want to show its output, based on one of the reasons outlined above.
  • Finally, the index of the column can appear in the output, so we have shown through set_index how to hide it from what is displayed.

This latest in the Python Dataframe series looks to build on the knowledge in the previous examples. We hope as you learn python online, it will increase your programming skills.

Thanks for watching and don’t forget to like and share through our social media buttons to the right of this page.

Data Analytics Ireland

YouTube channel lists – Python DataFrames

Estimated reading time: 1 minute

Welcome to this new blogging website! We are all about data analytics to have a look at this page here About Data Analytics Ireland

To keep it simple we have created some lists here and on our YouTube Channel

As we progress over the next while, the website will be updated as we go along, and while there may be a  lot of video content, we will look to mix it up with different formats.

We have started with Python Data frames :

We hope you enjoy and don’t forget if you like what we are doing subscribe to our channel!

Data Analytics Ireland