ValueError: Columns must be same length as key

Estimated reading time: 3 minutes

Are you looking to learn python , and in the process coming across this error and trying to understand why it occurs?

In essence, this usually occurs when you have more than one data frames and in the process of writing your program you are trying to use the data frames and their data, but there is a mismatch in the no of items in each that the program cannot process until it is fixed.

A common scenario where this may happen is when you are joining data frames or splitting out data, these will be demonstrated below.

Scenario 1 – Joining data frames

Where we have df1[[‘a’]] = df2 we are assigning the values on the left side of the equals sign to what is on the right.

When we look at the right-hand side it has three columns, the left-hand side has one.

As a result the error “ValueError: Columns must be same length as key” will appear, as per the below.

import pandas as pd

list1 = [1,2,3]
list2 = [[4,5,6],[7,8,9]]

df1 = pd.DataFrame(list1,columns=['column1'])
df2 = pd.DataFrame(list2,columns=['column2','column3','column4'])

df1[['a']] = df2

The above code throws the below error:

The objective here is to have all the columns from the right-hand side, beside the columns from the left-hand side as follows:

What we have done is make both sides equal regards the no of columns to be shown from df2
Essentially we are taking the column from DF1, and then bringing in the three columns from DF2.
The columna, columnb, columnc below correspond to the three columns in DF2, and will store the data from them.

The fix for this issue is : df1[[‘columna’,’columnb’,’columnc’]] = df2

print (df1)

Scenario 2 – Splitting out data

There may be an occasion where you have a python list, and you need to split out the values of that list into separate columns.

new_list1 = ['1 2 3']
df1_newlist = pd.DataFrame(new_list1,columns=['column1'])

In the above, we have created a list, with three values that are part of one string. Here what we are looking to do is create a new column with the below code:

df1_newlist[["column1"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.

print(df1_newlist)

When we run the above it throws the following valueerror:

The reason it throws the error is that the logic has three values to be split out into three columns, but we have only defined one column in df1_newlist[[“column1”]]

To fix this, we run the below code:

df1_newlist[["column1","column2","column3"]] = df1_newlist["column1"].str.split(" ", expand=True) #Splitting based on the space between the values.

print(df1_newlist)

This returns the following output, with the problem fixed!

Create a HTML Table From Python Using Javascript

Estimated reading time: 5 minutes

So you are using Python and you have data stored in a data frame? You would like to present that on a webpage HTML table, but unsure how to achieve this. Well, read on to see two different methods that will help you.

Pass the data to Javascript which then passes to the HTML the data needed to create the table

In both methods, we are using Python Flask which has an app.py file and HTML files created to present the outcomes.

Step 1 – Read the data and create the data frame

For those familiar with Python Flask, we create the imports that allow us to create the webpage interface to present the data.

Then we create a list called data, which stored the information we need. After this, we create the data frame “df”.

Finally, we create a JSON file,and do some data cleansing.

from flask import Flask, render_template, json
import pandas as pd


app = Flask(__name__)

data = [['Joe','Dublin','100'],['Francois','Paris','100'],['Michael','Liverpool','100']]
df = pd.DataFrame(data, columns = ['Name', 'City','Age'])


json_output = df.to_json()
json_output = json.loads(json_output.replace("\'", '"')) # Removes back slash from JSON generated

Step 2 – Create the output HTML files

Method 1 – Here all this is doing is creating the function to create the webpage “index.html”. Note that name_json=json_output captures the data from step one, and this is what is passed over to the HTML page as a JSON.

In method 2 – We are using to_html which Renders a DataFrame as an HTML table, pure and simple.

As can be seen, it stores the data onto an HTML page that is stored in the templates folder.

#Method 1
@app.route('/')
def index():

    return render_template('index.html', name_json=json_output)

#This is  the start of method 2#
html = df.to_html()
#write html to file
im2_file = open("templates/index_method2.html", "w")
im2_file.write(html)
im2_file.close()

@app.route('/index_method2')
def index_method2():
    return render_template('index_method2.html')


if __name__ == "__main__":
    app.run(debug=True)

Step 3 – Create the HTML tables through javascript

So steps 1 and 2 were getting the data ready so it can be viewed on the web pages, so how are they built?

So let’s walk down through the code, note that a good bit of this is the HTML that is used to present the data on the page.

The first thing to notice is the <style></style> tags. this is for method 1 and applies the boxes around the output.

<pre id=”json”></pre> – This line shows the JSON data as follows:

JSON Output

In the below few lines this is the HTML that creates the table to store the data from method 1:

<table id="json_table"> ===> Method 1 table
    <tr>
        <td> Name</td>
        <td> City</td>
        <td> Age</td>
  </tr>
</table>

The next section has the Javascript that will populate both for method 1 and method 2, I will go through it now:

So the first line is creating a variable that references the method 1 table, and where the JSON data will be loaded to.

The second line is converting the JSON into a format that can be read and shown as the “JSON” screenshot above.

In Line 3 & 4, all we are doing here is creating a variable to store the output of the loop that follows in the subsequent lines.

The final set of lines in the script except for the very last line ( which relates to Method 2) catch the data that is captured as part of the loop and feed it to the table that appears on index.html, as follows:

HTML table created from a python data frame using Javascript.

The final line:

window.open(‘http://127.0.0.1:5000/index_method2’, ‘_blank’);

In the Javascript section, this relates to Method 2 and takes the data directly from the app.py file and outputs it to http://127.0.0.1:5000/index_method2.html

This is a very quick and easy way to create the HTML table, and as can be seen, involves less coding than method 1.

<script>
    var selectvalue = document.getElementById("json_table"), test={{ name_json | tojson }}; ===> First line
    document.getElementById("json").textContent = JSON.stringify(test, undefined, 2); ===> Second line

    const keys = Object.keys(test); ===> Line 3
    for (let i = -1; i < keys.length-1; i) ===> Line 4

    {
        const key = keys[i++];

        console.log(key, test[key]);

        a = JSON.stringify(test["Name"][i])
        b = JSON.stringify(test["City"][i])
        c = JSON.stringify(test["Age"][i])
        const tbl = document.getElementById("json_table");
        const row = tbl.insertRow();
        const cell1 = row.insertCell();
        const cell2 = row.insertCell();
        const cell3 = row.insertCell();
        cell1.innerHTML = a;
        cell2.innerHTML = b;
        cell3.innerHTML = c;

    }

    window.open('http://127.0.0.1:5000/index_method2', '_blank');

  </script>

The Full HTML code

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <title>Document</title>
</head>
<style>
table, td {
  border: 1px solid;
}
</style>
<body>
<pre id="json"></pre>
<div>Method 1 pass the data via JSON to Javascript </div>
<table id="json_table">
    <tr>
        <td> Name</td>
        <td> City</td>
        <td> Age</td>
  </tr>
</table>

  <script>
    var selectvalue = document.getElementById("json_table"), test={{ name_json | tojson }};
    document.getElementById("json").textContent = JSON.stringify(test, undefined, 2);

    const keys = Object.keys(test);
    for (let i = -1; i < keys.length-1; i)

    {
        const key = keys[i++];

        console.log(key, test[key]);

        a = JSON.stringify(test["Name"][i])
        b = JSON.stringify(test["City"][i])
        c = JSON.stringify(test["Age"][i])
        const tbl = document.getElementById("json_table");
        const row = tbl.insertRow();
        const cell1 = row.insertCell();
        const cell2 = row.insertCell();
        const cell3 = row.insertCell();
        cell1.innerHTML = a;
        cell2.innerHTML = b;
        cell3.innerHTML = c;

    }

    window.open('http://127.0.0.1:5000/index_method2', '_blank');

  </script>


</body>
</html>

#how to create a html table from a python dataframe using javascript

The Final output

Method 1
Method 2

TypeError: Array() Argument 1 Must Be A Unicode Character, Not List

In a recent post, we discussed arrays and what they mean and how they differ from lists. When you have a list in an array, one of the things that you need to define is what data type it is. If you don’t then the array will throw the error that you are on this page to resolve.

On the Python.org website, below are the list of values that can be populated to indicate what the data type of a list is in an array.

If you don’t add in the value you will get the below error:

ValueError: bad typecode (must be b, B, u, h, H, i, I, l, L, q, Q, f or d)

Let’s look at an example

import array as test_array

a = test_array.array([1,2,3])

print(a)

Gives an error of:
TypeError: array() argument 1 must be a unicode character, not list

As can be seen, the above logic tries to print a list within an array. The problem here is that an array can only be of one data type, and it has to be specified on the array creation.

How do we fix this?

It is quite simple! Before the list, we simply specify which type code we would like to apply from the above list. In this instance we are going to assign it as “signed int” that being the value “i” as follows:

import array as test_array

a = test_array.array("i",[1,2,3])

print(a)

Prints the following with no error:
array('i', [1, 2, 3])

Process finished with exit code 0

I can change the value “i” to any of the values I want from the above list, just picked that one to show as an example.

How to Pass Python Variables to Javascript

Estimated reading time: 4 minutes

In our recent blog posting How to Pass a Javascript Variable to Python using JSON, we demonstrated how to easily use AJAX to pass whatever data you wanted and then manipulate it with Python.

In this blog posting, we are going to show how to do this the other way around. The scenario is that you have an application and or website that wants to use data generated through Python, but let Javascript then use it within the application.

As Python can be connected to numerous databases and files ( txt, excel) etc, this piece of logic is very useful for the programmer looking to integrate both programming languages.

Let’s start looking at the code, and see how this can be achieved.

Step 1 – What Files are generated?

This program uses Python Flask to create a web page, that has a drop-down menu. The two files used to generate this are as follows:

(A) app.py – This is the python file that creates a website and loads a template HTML file as outlined below.

(B) Index.html – This is the template file that loads into the browser and runs all the javascript. The javascript loaded here also loads the python data passed over from app.py

Step 2 – APP.PY code overview

The Python library that enables webpage creation is called Flask, and as can be seen below it has to be imported.

In addition, we need to also import render_template which tells the program to go to the templates folder and load “Index.HTML”

The variable that is been passed to JavaScript is called name, and these are the values that you will see in the web browser when it is loaded.

from flask import Flask, render_template

app = Flask(__name__)

@app.route('/')
def index():
    name = ['Joe','John','Jim','Paul','Niall','Tom']
    return render_template('index.html', name=name)

if __name__ == "__main__":
    app.run(debug=True)

Step 3 – Index.HTML overview

Here is the template HTML file that runs in the browser. You can add CSS etc to this to make it look nicer and more user friendly.

As you can see it has the usual HTML tags appear as part of a website.

Well look at some of the code further:

In this bit <select id =’select’> </select>, this is the dropdown menu that will appear when Index.html is opened. It will store all the values passed from python. Note that its id is “select”, this will be used later on.

The main parts to focus on next is between <script></script>. This is what reads in the python data and populates it to the dropdown menu.

In step 2 we mentioned that there was a variable called “name”, with values to be passed over.

This is acheived on this line:

var select = document.getElementById(“select”), test = {{ name | tojson }};

Notice that name appears here, and this is referencing back to the exact same value that was discussed in step 2.

For the rest of the lines, I have explained with comments what each does.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Pass Python variable to Javascript</title>
</head>
<body>

<select id ='selectvalue'>
</select>
<script>
    //name = ['Joe','John','Jim','Paul','Niall','Tom']

    var selectvalue = document.getElementById("selectvalue"), test = {{ name | tojson }};
    //The increment operator (++) increments (adds one to) its operand and returns a value.
    for(var i = 0; i < test.length; i++) // This line checks for the length of the data you feeding in i.e the no of items
         {
var selection = document.createElement("OPTION"), // This line creates a variable to store the different values fed in from the JSON object "TEST"
txt = document.createTextNode(test[i]); // This just reads each value from the test JSON variable above
selection.appendChild(txt); // This line appends each value as it is read.
selection.setAttribute("value",test[i]); // This line sets each value read in as a value for the drop down
selectvalue.insertBefore(selection,selectvalue.lastChild); //This reads eah value into the dropdown based on the order in the "TEST" above.
 }
</script>
</body>
</html>

Step 4 – What the output looks like!

From step 2, these are values we asked to be used in Javascript to populate a dropdown:

name = [‘Joe’,’John’,’Jim’,’Paul’,’Niall’,’Tom’]

Python variable passed to Javascript

How To Compare CSV Files for Differences

Estimated reading time: 5 minutes

Often a lot of data analytics projects involve comparisons, some of it could be to look at data quality problems, other times it could be to check that data you are loading has been saved to the database with no errors.

As a result of this, there is a need to quickly check all your data is correct. But rather r than do visual comparisons, wouldn’t it be nice to use Python to quickly resolve this?

Luckily for you in this blog post, we will take you through three ways to quickly get answers, they could be used together or on their own.

Let’s look at the data we want to compare

We have two CSV files, with four columns in them:

The objective here is to compare the two and show the differences in the output.

Import the files to a dataframe.

import pandas as pd
import numpy as np
df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

The above logic is very straightforward, it looks for the files in the same folder as the python script and looks to import the data from the CSV files to the respective data frames.

The purpose of this is that the following steps will use these data frames for comparison.

Method 1 – See if the two data frames are equal

In the output for this, it shows differences through a boolean value in this instance “True” or “False”.


array1 = np.array(df1) ===> Storing the data in an array will allow the equation below to show the differences.
array2 = np.array(df2)

df_CSV_1 = pd.DataFrame(array1, columns=['No','Film','Year','Length (min)'])
df_CSV_2 = pd.DataFrame(array2, columns=['No','Film','Year','Length (min)'])

df_CSV_1.index += 1 ===> This resets the index to start at 1 not 0, helps with the output when trying to understand the differences.
df_CSV_2.index += 1

df_CSV_1.index += 1 ===> This resets the index to start at 1 not 0, helps with the output when trying to understand the differences.
df_CSV_2.index += 1

print(df_CSV_1.eq(df_CSV_2).to_string(index=True)) ===> This shows the differences between the two arrays.

Your output will look like this, and as can be seen on lines 3 and 13 are false, these are the yellow values in the CSV2 file that are different to the CSV1 file values, all other data is the same which is correct.

The obvious advantage of the below is that you can quickly what is different and on what line, but now what values are different, we will explore that in the next methods.

        No  Film   Year  Length (min)
1   True  True   True          True
2   True  True   True          True
3   True  True  False          True
4   True  True   True          True
5   True  True   True          True
6   True  True   True          True
7   True  True   True          True
8   True  True   True          True
9   True  True   True          True
10  True  True   True          True
11  True  True   True          True
12  True  True   True          True
13  True  True  False          True
14  True  True   True          True
15  True  True   True          True
16  True  True   True          True
17  True  True   True          True
18  True  True   True          True
19  True  True   True          True
20  True  True   True          True

Method 2 – Find and print the values only that are different

So in the first approach, we could see there are differences, but not what lines are different between the two files.

In the below code it will again look at the data frames but this time print the values from the CSV1 file that have different values in the CSV2 file.

a = df1[df1.eq(df2).all(axis=1) == False] ===> This compares the data frames, but only returns the rows from DF1 that have a different value in one of the columns on DF2

a.index += 1 ===>This resets the index to start at 1, not 0, which helps with the output when trying to understand the differences. 

print(a.to_string(index=False))

As a result, the output from this as expected is:

No        Film  Year  Length (min)
  3    Parasite  2019           132
 13  Midsommar   2019           148

Method 3 – Show your differences and the value that are different

The final way to look for any differences between CSV files is to use some of the above but show where the difference is.

In the below code, the first line compares the two years between the two sets of data, and then applies a true to the column if they match, otherwise a false.


df1['Year_check_to_DF2'] = np.where(df1['Year'] == df2['Year'], 'True', 'False')
df1.index += 1 #resets the index to start from one.

df2_year = df2['Year'] ===> We create this column to store the DF2 year value.
df2_year = pd.Series(df2_year) #Series is a one-dimensional labeled array capable of holding data of any type.

df1 = df1.assign(df2_year=df2_year.values) = This adds the DF2 year value to the DF1 data frame
print(df1.to_string(index=False))

In this instance, this returns the below output. As can be seen, it allows us to complete a line visual of what is different.

So in summary we have completed a comparison of what is different between files.

There are a number of practical applications of this:

(A) If you are loading data to a database table, the uploaded data can be exported and compared to the original CSV file uploaded.

(B) Data Quality – Export key data to a CSV file and compare.

(C) Data Transformations – Compare two sets of data, and ensure transformations have worked as expected. In this instance, differences are expected, but as long as they are what you programmed for, then the validation has worked.

If you like this blog post, there are others that may interest you:

How to Compare Column Headers in CSV to a List in Python

How to count the no of rows and columns in a CSV file

What is data profiling and its benefits?

Estimated reading time: 4 minutes

Data profiling is the process of creating statistics on a data set that will allow readers of the metrics to understand how good the data quality is for that data.

Usually this is one of the many functions of a data analyst.

Many organisations have data quality issues, and the ability to identify them and fix helps with many customer and operational problems proactively.

As a result, it can help to identify errors in data that may:

  • Feed into reports.
  • Reduce the effectiveness of machine learning outputs.
  • Have a regulatory impact on reports submitted and how their effectiveness is measured.
  • Dissatisfied customers will get irritated with receiving communications that have incorrect data on them.
  • Batch processes will fail, reducing the effectiveness of automated tasks.

To understand how to implement an effective data profiling process, it is essential to identify the data where the issues may occur:

  • Data entry by a human.
  • Imported data not cleansed.
  • Third-party systems are feeding you data that has errors in it.
  • Company takeovers, integrating data that has errors on it.

The amount of data that is now collected and stored in big data systems, needs a process to manage and capture errors.

So what are the different ways to profile data?

To ensure a high level of data quality, you would look at some of the following techniques:

  • Completeness – Does the data available represent a complete picture of the data that should be present?
  • Conformity – Is the data conforming to the correct structure as would be expected when you observe it?
  • Consistency – If you have the same data in two different systems, are they the same values.
  • Accuracy – There will be a need to ensure that the data present is accurate. This could fundamentally make any decisions made on the back of it not correct, which could have known on effects.
  • Uniqueness – If there are properties of data that are unique, does the data set show that.

When should data profiling take place?

This will depend on the organisation and the process that relies on it.

We will outline  some different scenarios that may influence how to approach this

Straight through processing – If you are looking to automate, there will be a need to ensure that no automated process fails.

As a result, there will be a need to check the data before it feeds a new system. Some steps could be implemented include:

  • Scan the data source for known data issues.
  • Apply logic to fix any data issues found.
  • Feed the data to its destination once all corrections have been made.

Problems that may occur with this:

  • New errors how to handle them, do you let them occur and fix them and the logic to be caught in the future?
  • This leads to fixes been required in the destination system, which leads to the more downstream fixing of data.
  • You cant control data with errors coming in; you need to report and validate updates that are required.

2. Batch processing – In this scenario, there is a delay in feeding the data, as the data has to be available to feed into the destination system.

As with the automated process, there is some level of automation, but there is more control around when the data is provided, and it can be paused or rerun. Some of the steps that can be implemented include:

  • Scan the data and provide a report on its quality. Fix the data if errors found, then upload.
  • Allow the data to load, and then using a report, fix it in a downstream system.
  • Work with the providers of the data to improve the data quality of the data received.

Scenarios where data profiling can be applied

MeasurementScenario ExampleImpact
Completeness – Does the data available represent a complete picture of the data that should be present.DOB populatedCant use as part of security checks when discussing customer or miscalculate values that are dependant on the DOB.
Conformity – Is the data conforming to the correct structure as would be expected when you observe it?  Email address incorrectEmails to customers bounce back; needs follow up to correct, the customer does not get proper communication.
Consistency – If you have the same data in two different systems, are they the same values?  Data stored on different systems needs to be exactly the same.The customer could be communicated different versions of the same data.
Accuracy – There will be a need to ensure that the data present is accurate. This could fundamentally make any decisions made on the back of it not correct, which could have a knock-on effect.Innaccurate data means incorrext decisionsSending out communications to the wrong set of customers who don’t expect or need the information.
Uniqueness – If there are properties of data that are unique, does the data set show that?The same data is populated for different sets of independent  customers.No visibility to the customer and their actual correct data. Incorrect information processed for them. The financial and reputational risk could also be a problem.

What does a data analyst do?

Estimated reading time: 4 minutes

Livestream #2 – What does a data analyst do?

You are probably sitting there hearing about big data and databases, data analytics and machine learning and wonder where a data analyst fits in?

Here we will look to break it down step by step.

Sometimes a data analyst can be confused with a business analyst; there are subtle differences:

  • Business Analyst: Their role is to document the user’s requirements in a document that is descriptive of what the user wants.
    • In this case, a document that all parties can agree to is created, and it can be used as part of the project sign-off.
  • Data Analyst: On the other hand, a data analyst will take the business requirements and translate them into data deliverables.
    • They use the document to ensure the project has the right data to meet the project objectives in the right place at the right time.

Data Mapping

In different data projects, there will be a need to reconcile the data between systems, a data analysis will help here.

In a data mapping exercise, the data analyst will be expected to look at one or more sources and map them to a destination system.

  • This ensures a match between the two datasets.
  • Which results in the ability to reconcile the two systems.
  • Allows the ability to use data in multiple systems, knowing the consistency is in place.
  • Consistency of the data types between the systems.
  • It ensures that data validation errors are kept to a minimum.

Often a Data Analyst will build a traceability matrix, which tracks the data item from creation through to consumption.

Data Quality

In most companies, there will be teams (depending on their size) dedicated to this, and their input will be pivotal to existing and future data use.

Data quality is an important task that could impact internal and external reporting and a company’s ability to make decisions accurately.

Some of the areas that might be looked at include:

(A) Investigate duplicate data – There could be a number of reasons this has to be checked:

  • Data was manually entered multiple times.
  • An automated process ran multiple times.
  • A change to an IT system has unknowingly duplicated data.

(B) Finding errors – This could be completed in conjunction with the data reporting outlined below.

  • Normally companies will clearly have rules that pick up the data errors that are not expected.
  • A data analyst will analyse why these errors are occurring.

(C) Checking for missing data.

  • Data feeds have failed. A request to reload the data will be required.
  • Data that was not requested as part of the business requirements confirm that this is the case.

(D) Enhancing the data with additional information – Is there additional information that can be added that can enrich the dataset?

(E) Checking data is in the correct format – There are scenarios where this can go wrong, and an example is a date field is populated with text.

Data Reporting

In some of the areas above, we touched on the importance of the quality of data.

Ultimately there may be a need to track:

  • Data Quality – Build reports to capture the quality of data based on predefined business measurements.
  • Real-time Reporting – No new customers or customers who have left an organisation.
  • Track Targets – Is the target set by the business been met daily, weekly, or monthly?
  • Management Reporting – Build reports that provide input to management packs that provide an overview of how the business performs.

Data Testing

Organisations go through change projects where new data is being introduced or enhanced.

As a result, the data analyst will have a number of tasks to complete:

  • Write Test Scripts – Write all scripts for record counts, transformations and table-to-table comparisons.
  • Datatype Validation – Ensures all new data will be the same as the other data where it is stored.
  • No loss of data – Check all data is imported correctly with no data truncated.
  • Record count – Write an SQL script that would complete a source-to-the-destination reconciliation.
  • Data Transformation – Ensure any transformations are applied correctly.

Supporting data projects

Ad hoc projects are common, and sometimes become a priority for businesses as they deal with requirements that result as part of an immediate business need.

Data Analysts will be called upon to support projects where there is a need to ensure the data required is of a standard that meets the project deliverables:

Some common areas where this might occur include:

  • Extract data where it has been found to have been corrupted.
  • Investigate data changes, to analyse where a data breach may have occurred.
  • An external regulatory body has requested information to back up some reports submitted.
  • A customer has requested all the company’s information on them; usually the case for a GDPR request.

Tkinter GUI tutorial python – how to clean excel data

Estimated reading time: 2 minutes

Tkinter is an application within Python that allows users to create GUI or graphical user interfaces to manage data in a more user-friendly way.

We are building our data analytics capability here, and looking to provide the user with the functionality they use in their work or college projects.

We have tested this code over 100,000 records sitting on the Microsoft OneDrive network so in a way, for this reason, its speeds were quite good.

As a result over five tests, they all were under 100s from start to finish.

data cleansing data cleansing fixed

In this Tkinter GUI tutorial python, you will be shown how to find the data errors, clean them and then export the final result to excel.

We will take you through the following:

  • Creation of the Tkinter interface.
  • Methods/ functions to find errors.
  • Methods/functions to clean the data.
  • Exporting the clean data to an excel file.

 

To sum up:

The video walks through the creation of a Tkinter window using a canvas and a frame to store the data frame.

Then it looks at importing the data through pd.read_excel, to load the data into a pandas data frame.

Next, there is a function and or method that will extract the errors through str.extract , which is loaded into separate columns

Finally, I have exported the clean dataset using rawdata.to_excel , and saved the file as a separate new spreadsheet.

how to create an instance of a class

Estimated reading time: 1 minute

Here in how to create an instance of a class, as described herein, how to create a class in Python, we will further explore the instance of class and how this can be used within a program to assign values to an object. This allows that object to inherit those values contained within the class, making it easier to have consistency regards functionality and data.

This video covers off

(a) creating an instance of a class

(B) Using the __init__ within the class

(C) define the constructor method __init__

(D) Creating an object that calls a class and uses the class to process some piece of data.

What are the benefits of this?

  • You only need to create one class that holds all the attributes required.
  • That class can be called from anywhere within a program, once an instance of it is created.
  • You can update the class, and once completed, those new values will become available to an instance of that class.
  • Makes for better management of objects and their properties, not multiple different versions contained within a program

 

 

How to create a class in Python

Estimated reading time: 1 minute

How to create a class in Python: In this video explaining classes will be the main topic on how they are constructed,  explain how to create an instance of a class.

When talking about classes, they can also be referred to as object-orientated programming.

Also, we look at what class attributes are and how they can be used to assign key data that can be called anywhere within a program.

The steps involve the following:

(a) Create a class

(B) Assign attributes to the class

(C) Create a method within the class ( similar to a function)

(D) Create an instance of a class to call its attributes and methods.

This video is a follow on from object oriented programming – Python Classes explained