ValueError: Columns must be same length as key

Estimated reading time: 3 minutes

Are you looking to learn python , and in the process coming across this error and trying to understand why it occurs?

In essence, this usually occurs when you have more than one data frames and in the process of writing your program you are trying to use the data frames and their data, but there is a mismatch in the no of items in each that the program cannot process until it is fixed.

A common scenario where this may happen is when you are joining data frames or splitting out data, these will be demonstrated below.

Scenario 1 – Joining data frames

Where we have df1[[‘a’]] = df2 we are assigning the values on the left side of the equals sign to what is on the right.

When we look at the right-hand side it has three columns, the left-hand side has one.

As a result the error “ValueError: Columns must be same length as key” will appear, as per the below.

import pandas as pd

list1 = [1,2,3]
list2 = [[4,5,6],[7,8,9]]

df1 = pd.DataFrame(list1,columns=['column1'])
df2 = pd.DataFrame(list2,columns=['column2','column3','column4'])

df1[['a']] = df2

The above code throws the below error:

The objective here is to have all the columns from the right-hand side, beside the columns from the left-hand side as follows:

What we have done is make both sides equal regards the no of columns to be shown from df2
Essentially we are taking the column from DF1, and then bringing in the three columns from DF2.
The columna, columnb, columnc below correspond to the three columns in DF2, and will store the data from them.

The fix for this issue is : df1[[‘columna’,’columnb’,’columnc’]] = df2

print (df1)

Scenario 2 – Splitting out data

There may be an occasion where you have a python list, and you need to split out the values of that list into separate columns.

new_list1 = ['1 2 3']
df1_newlist = pd.DataFrame(new_list1,columns=['column1'])

In the above, we have created a list, with three values that are part of one string. Here what we are looking to do is create a new column with the below code:

df1_newlist[["column1"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.

print(df1_newlist)

When we run the above it throws the following valueerror:

The reason it throws the error is that the logic has three values to be split out into three columns, but we have only defined one column in df1_newlist[[“column1”]]

To fix this, we run the below code:

df1_newlist[["column1","column2","column3"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.

print(df1_newlist)

This returns the following output, with the problem fixed!

Create a HTML Table From Python Using Javascript

Estimated reading time: 5 minutes

So you are using Python and you have data stored in a data frame? You would like to present that on a webpage HTML table, but unsure how to achieve this. Well, read on to see two different methods that will help you.

Pass the data to Javascript which then passes to the HTML the data needed to create the table

In both methods, we are using Python Flask which has an app.py file and HTML files created to present the outcomes.

Step 1 – Read the data and create the data frame

For those familiar with Python Flask, we create the imports that allow us to create the webpage interface to present the data.

Then we create a list called data, which stored the information we need. After this, we create the data frame “df”.

Finally, we create a JSON file,and do some data cleansing.

from flask import Flask, render_template, json
import pandas as pd


app = Flask(__name__)

data = [['Joe','Dublin','100'],['Francois','Paris','100'],['Michael','Liverpool','100']]
df = pd.DataFrame(data, columns = ['Name', 'City','Age'])


json_output = df.to_json()
json_output = json.loads(json_output.replace("\'", '"')) # Removes back slash from JSON generated

Step 2 – Create the output HTML files

Method 1 – Here all this is doing is creating the function to create the webpage “index.html”. Note that name_json=json_output captures the data from step one, and this is what is passed over to the HTML page as a JSON.

In method 2 – We are using to_html which Renders a DataFrame as an HTML table, pure and simple.

As can be seen, it stores the data onto an HTML page that is stored in the templates folder.

#Method 1
@app.route('/')
def index():

    return render_template('index.html', name_json=json_output)

#This is  the start of method 2#
html = df.to_html()
#write html to file
im2_file = open("templates/index_method2.html", "w")
im2_file.write(html)
im2_file.close()

@app.route('/index_method2')
def index_method2():
    return render_template('index_method2.html')


if __name__ == "__main__":
    app.run(debug=True)

Step 3 – Create the HTML tables through javascript

So steps 1 and 2 were getting the data ready so it can be viewed on the web pages, so how are they built?

So let’s walk down through the code, note that a good bit of this is the HTML that is used to present the data on the page.

The first thing to notice is the <style></style> tags. this is for method 1 and applies the boxes around the output.

<pre id=”json”></pre> – This line shows the JSON data as follows:

JSON Output

In the below few lines this is the HTML that creates the table to store the data from method 1:

<table id="json_table"> ===> Method 1 table
    <tr>
        <td> Name</td>
        <td> City</td>
        <td> Age</td>
  </tr>
</table>

The next section has the Javascript that will populate both for method 1 and method 2, I will go through it now:

So the first line is creating a variable that references the method 1 table, and where the JSON data will be loaded to.

The second line is converting the JSON into a format that can be read and shown as the “JSON” screenshot above.

In Line 3 & 4, all we are doing here is creating a variable to store the output of the loop that follows in the subsequent lines.

The final set of lines in the script except for the very last line ( which relates to Method 2) catch the data that is captured as part of the loop and feed it to the table that appears on index.html, as follows:

HTML table created from a python data frame using Javascript.

The final line:

window.open(‘http://127.0.0.1:5000/index_method2’, ‘_blank’);

In the Javascript section, this relates to Method 2 and takes the data directly from the app.py file and outputs it to http://127.0.0.1:5000/index_method2.html

This is a very quick and easy way to create the HTML table, and as can be seen, involves less coding than method 1.

<script>
    var selectvalue = document.getElementById("json_table"), test={{ name_json | tojson }}; ===> First line
    document.getElementById("json").textContent = JSON.stringify(test, undefined, 2); ===> Second line

    const keys = Object.keys(test); ===> Line 3
    for (let i = -1; i < keys.length-1; i) ===> Line 4

    {
        const key = keys[i++];

        console.log(key, test[key]);

        a = JSON.stringify(test["Name"][i])
        b = JSON.stringify(test["City"][i])
        c = JSON.stringify(test["Age"][i])
        const tbl = document.getElementById("json_table");
        const row = tbl.insertRow();
        const cell1 = row.insertCell();
        const cell2 = row.insertCell();
        const cell3 = row.insertCell();
        cell1.innerHTML = a;
        cell2.innerHTML = b;
        cell3.innerHTML = c;

    }

    window.open('http://127.0.0.1:5000/index_method2', '_blank');

  </script>

The Full HTML code

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <title>Document</title>
</head>
<style>
table, td {
  border: 1px solid;
}
</style>
<body>
<pre id="json"></pre>
<div>Method 1 pass the data via JSON to Javascript </div>
<table id="json_table">
    <tr>
        <td> Name</td>
        <td> City</td>
        <td> Age</td>
  </tr>
</table>

  <script>
    var selectvalue = document.getElementById("json_table"), test={{ name_json | tojson }};
    document.getElementById("json").textContent = JSON.stringify(test, undefined, 2);

    const keys = Object.keys(test);
    for (let i = -1; i < keys.length-1; i)

    {
        const key = keys[i++];

        console.log(key, test[key]);

        a = JSON.stringify(test["Name"][i])
        b = JSON.stringify(test["City"][i])
        c = JSON.stringify(test["Age"][i])
        const tbl = document.getElementById("json_table");
        const row = tbl.insertRow();
        const cell1 = row.insertCell();
        const cell2 = row.insertCell();
        const cell3 = row.insertCell();
        cell1.innerHTML = a;
        cell2.innerHTML = b;
        cell3.innerHTML = c;

    }

    window.open('http://127.0.0.1:5000/index_method2', '_blank');

  </script>


</body>
</html>

#how to create a html table from a python dataframe using javascript

The Final output

Method 1
Method 2

How To Check For Unwanted Characters Using Loops With Python

Estimated reading time: 3 minutes

On this website, we have posted about how to remove unwanted characters from your data and How to remove characters from an imported CSV file both will show you different approaches.

In this blog posting, we are going to approach the process by using loops within a function. Essentially we are going to pass a list and then we are going to loop through the strings to check the data against it.

Step 1 – Import the data and create a data frame

The first bit of work we need to complete is to load the data. Here we create a dictionary with their respective key-value pairs.

In order to prepare the data to be processed through the function in step 2, we then load it into a data frame.

import pandas as pd
#Create a dataset locally
data = {'Number':  ["00&00$000", '111$11111','2222€2222','3333333*33','444444444£','555/55555','666666@666666'],
        'Error' : ['0','0','0','0','0','0','0']}

#Create a dataframe and tell it where to get its values from
df = pd.DataFrame (data, columns = ['Number','Error'])

Step 2 – Create the function that checks for invalid data

This is the main piece of logic that gives the output. As you can see there is a list “L” that is fed to the function run.

One thing to note is that *l is passed to the function, as there is more than one value in the list, otherwise the program would not execute properly.

To start off we create a data frame, which extracts using a regular expression the characters we don’t want to have.

Next, we then need to drop a column that is generated with NAN values, as these are not required.

Then we updated the original data fame with the values that we found.

Just in case if there are any NAN values in this updated column “Error”, we remove them on the next line.

The main next is the loop that creates a new column called “Fix”. This holds the values that will be populated into the fix after the data we don’t want is removed and is data cleansed.

The data we do not need is in str.replace.

#Function to loop through the dataset and see if any of the list values below are found and replace them
def run(*l):
    #This line extracts all the special characters into a new column
    #Using regular expressions it finds values that should not appear
    df2 = df['Number'].str.extract('(\D+|^(0)$)') # New dataframe to get extraction we need
    print(df2)
    df2 = df2.drop(1, axis=1) # Drops the column with NAN in it, not required

    df['Error'] = df2[0] # Updates original dataframe with values that need to be removed.
    #This line removes anything with a null value
    df.dropna(subset=['Error'], inplace=True)
    #This line reads in the list and assigns it a value i, to each element of the list.
    #Each i value assigned also assigns an index number to the list value.
    #The index value is then used to check whether the value associated with it is in the df['Number'] column 
    #and then replaces if found
    for i in l:
        df['Fix']= df['Number'].str.replace(i[0],"").str.replace(i[1],"").str.replace(i[2],"").str.replace(i[3],"") \
        .str.replace(i[4],"").str.replace(i[5],"").str.replace(i[6],"")
        print("Error list to check against")
        print(i[0])
        print(i[1])
        print(i[2])
        print(i[3])
        print(i[4])
        print(i[5])
        print(i[6])
    print(df)

#This is the list of errors you want to check for
l = ['$','*','€','&','£','/','@']

Step 3 – Run the program

To run this program, we just execute the below code. All this does is read in the list “L” to the function “run” and then the output in step 4 is produced

run(l)

Step 4 – Output

Error list to check against
$
*
€
&
£
/
@
          Number Error           Fix
0      00&00$000     &       0000000
1      111$11111     $      11111111
2      2222€2222     €      22222222
3     3333333*33     *     333333333
4     444444444£     £     444444444
5      555/55555     /      55555555
6  666666@666666     @  666666666666

Process finished with exit code 0

ValueError: pattern contains no capture groups

Estimated reading time: 2 minutes

In Python, there are a number of re-occurring value errors that you will come across.

In this particular error it is usually related to when you are running regular expressions as part of a pattern search.

So how does the problem occur?

In the below, the aim of the code is to purely create a data frame, that can then be searchable.

To search the data frame we will use str.extract

import pandas as pd
rawdata = [['Joe', 'Jim'],
           ['Jane', 'Jennifer'],
           ['Ann','Alison']]
datavalue = pd.DataFrame(data=rawdata, columns=['A', 'B'])

We then add the below code to complete the extract of the string “Joe”.

a = datavalue['A'].str.extract('Joe')
print(a)

But it gives the below error, what we are trying to solve for:

ValueError: pattern contains no capture groups
Process finished with exit code 1

But why did the error occur , and how can we fix it?

In essence when you try to complete a str.extract, the value you are looking for should be enclosed in brackets i.e ()

In the above, it views ‘Joe’ as an incorrect value to be passed into the str.extract function, and returns the error.

So to fix this problem, we would change this line to:

a = datavalue['A'].str.extract('(Joe)')

As a result the program runs without error, and returns the below result:

     0
0  Joe
1  NaN
2  NaN

The full corrected code to be used is then:

import pandas as pd
rawdata = [['Joe', 'Jim'],
           ['Jane', 'Jennifer'],
           ['Ann','Alison']]
datavalue = pd.DataFrame(data=rawdata, columns=['A', 'B'])
a = datavalue['A'].str.extract('(Joe)')
print(a)

hide a column from a data frame

Estimated reading time: 2 minutes

They say there is nowhere to hide, we disagree!
As an addition to How to add a column to a dataframe would you like to learn to go and hide it?! This video has several steps in it; following each one will give you a good introduction.

To start why you would like to hide a column?

  • You may not want to reveal its output as it is sensitive information.
  • The data in the column is not in the correct format, you will want to repurpose it, so it is the way you want it.
  •  The column could be a calculated column. Hence it serves as an intermediary step before your data frame is output.

Finding the best way to hide unwanted data:

In this video, we introduce several concepts to help not show a column:

  • Specify the actual columns you want to include in the data frame, by default doing this you are excluding the column or columns you don’t want to see.
  •  We use drop, to explicitly tell the data frame not to show a particular column.
  •  Also, we display a scenario whereby you have a calculated column but do not want to show its output, based on one of the reasons outlined above.
  • Finally, the index of the column can appear in the output, so we have shown through set_index how to hide it from what is displayed.

This latest in the Python Dataframe series looks to build on the knowledge in the previous examples. We hope as you learn python online, it will increase your programming skills.

Thanks for watching and don’t forget to like and share through our social media buttons to the right of this page.

Data Analytics Ireland

YouTube channel lists – Python DataFrames

Estimated reading time: 1 minute

Welcome to this new blogging website! We are all about data analytics to have a look at this page here About Data Analytics Ireland

To keep it simple we have created some lists here and on our YouTube Channel

As we progress over the next while, the website will be updated as we go along, and while there may be a  lot of video content, we will look to mix it up with different formats.

We have started with Python Data frames :

We hope you enjoy and don’t forget if you like what we are doing subscribe to our channel!

Data Analytics Ireland