Estimated reading time: 3 minutes
Are you looking to learn python, and in the process coming across this error and trying to understand why it occurs?
In essence, this usually occurs when you have more than one data frames and in the process of writing your program you are trying to use the data frames and their data, but there is a mismatch in the no of items in each that the program cannot process until it is fixed.
A common scenario where this may happen is when you are joining data frames or splitting out data, these will be demonstrated below.
Scenario 1 – Joining data frames
Where we have df1[[‘a’]] = df2 we are assigning the values on the left side of the equals sign to what is on the right.
When we look at the right-hand side it has three columns, the left-hand side has one.
As a result the error “ValueError: Columns must be same length as key” will appear, as per the below.
import pandas as pd
list1 = [1,2,3]
list2 = [[4,5,6],[7,8,9]]
df1 = pd.DataFrame(list1,columns=['column1'])
df2 = pd.DataFrame(list2,columns=['column2','column3','column4'])
df1[['a']] = df2
The above code throws the below error:
The objective here is to have all the columns from the right-hand side, beside the columns from the left-hand side as follows:
What we have done is make both sides equal regards the no of columns to be shown from df2
Essentially we are taking the column from DF1, and then bringing in the three columns from DF2.
The columna, columnb, columnc below correspond to the three columns in DF2, and will store the data from them.
The fix for this issue is : df1[[‘columna’,’columnb’,’columnc’]] = df2
print (df1)
Scenario 2 – Splitting out data
There may be an occasion when you have a python list, and you need to split out the values of that list into separate columns.
new_list1 = ['1 2 3']
df1_newlist = pd.DataFrame(new_list1,columns=['column1'])
In the above, we have created a list, with three values that are part of one string. Here what we are looking to do is create a new column with the below code:
df1_newlist[["column1"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.
print(df1_newlist)
When we run the above it throws the following valueerror:
The reason it throws the error is that the logic has three values to be split out into three columns, but we have only defined one column in df1_newlist[[“column1”]]
To fix this, we run the below code:
df1_newlist[["column1","column2","column3"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.
print(df1_newlist)
This returns the following output, with the problem fixed!