Create a HTML Table From Python Using Javascript

Estimated reading time: 5 minutes

So you are using Python and you have data stored in a data frame? You would like to present that on a webpage HTML table, but unsure how to achieve this. Well, read on to see two different methods that will help you.

Pass the data to Javascript which then passes to the HTML the data needed to create the table

In both methods, we are using Python Flask which has an app.py file and HTML files created to present the outcomes.

Step 1 – Read the data and create the data frame

For those familiar with Python Flask, we create the imports that allow us to create the webpage interface to present the data.

Then we create a list called data, which stored the information we need. After this, we create the data frame “df”.

Finally, we create a JSON file,and do some data cleansing.

from flask import Flask, render_template, json
import pandas as pd


app = Flask(__name__)

data = [['Joe','Dublin','100'],['Francois','Paris','100'],['Michael','Liverpool','100']]
df = pd.DataFrame(data, columns = ['Name', 'City','Age'])


json_output = df.to_json()
json_output = json.loads(json_output.replace("\'", '"')) # Removes back slash from JSON generated

Step 2 – Create the output HTML files

Method 1 – Here all this is doing is creating the function to create the webpage “index.html”. Note that name_json=json_output captures the data from step one, and this is what is passed over to the HTML page as a JSON.

In method 2 – We are using to_html which Renders a DataFrame as an HTML table, pure and simple.

As can be seen, it stores the data onto an HTML page that is stored in the templates folder.

#Method 1
@app.route('/')
def index():

    return render_template('index.html', name_json=json_output)

#This is  the start of method 2#
html = df.to_html()
#write html to file
im2_file = open("templates/index_method2.html", "w")
im2_file.write(html)
im2_file.close()

@app.route('/index_method2')
def index_method2():
    return render_template('index_method2.html')


if __name__ == "__main__":
    app.run(debug=True)

Step 3 – Create the HTML tables through javascript

So steps 1 and 2 were getting the data ready so it can be viewed on the web pages, so how are they built?

So let’s walk down through the code, note that a good bit of this is the HTML that is used to present the data on the page.

The first thing to notice is the <style></style> tags. this is for method 1 and applies the boxes around the output.

<pre id=”json”></pre> – This line shows the JSON data as follows:

JSON Output

In the below few lines this is the HTML that creates the table to store the data from method 1:

<table id="json_table"> ===> Method 1 table
    <tr>
        <td> Name</td>
        <td> City</td>
        <td> Age</td>
  </tr>
</table>

The next section has the Javascript that will populate both for method 1 and method 2, I will go through it now:

So the first line is creating a variable that references the method 1 table, and where the JSON data will be loaded to.

The second line is converting the JSON into a format that can be read and shown as the “JSON” screenshot above.

In Line 3 & 4, all we are doing here is creating a variable to store the output of the loop that follows in the subsequent lines.

The final set of lines in the script except for the very last line ( which relates to Method 2) catch the data that is captured as part of the loop and feed it to the table that appears on index.html, as follows:

HTML table created from a python data frame using Javascript.

The final line:

window.open(‘http://127.0.0.1:5000/index_method2’, ‘_blank’);

In the Javascript section, this relates to Method 2 and takes the data directly from the app.py file and outputs it to http://127.0.0.1:5000/index_method2.html

This is a very quick and easy way to create the HTML table, and as can be seen, involves less coding than method 1.

<script>
    var selectvalue = document.getElementById("json_table"), test={{ name_json | tojson }}; ===> First line
    document.getElementById("json").textContent = JSON.stringify(test, undefined, 2); ===> Second line

    const keys = Object.keys(test); ===> Line 3
    for (let i = -1; i < keys.length-1; i) ===> Line 4

    {
        const key = keys[i++];

        console.log(key, test[key]);

        a = JSON.stringify(test["Name"][i])
        b = JSON.stringify(test["City"][i])
        c = JSON.stringify(test["Age"][i])
        const tbl = document.getElementById("json_table");
        const row = tbl.insertRow();
        const cell1 = row.insertCell();
        const cell2 = row.insertCell();
        const cell3 = row.insertCell();
        cell1.innerHTML = a;
        cell2.innerHTML = b;
        cell3.innerHTML = c;

    }

    window.open('http://127.0.0.1:5000/index_method2', '_blank');

  </script>

The Full HTML code

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">

  <title>Document</title>
</head>
<style>
table, td {
  border: 1px solid;
}
</style>
<body>
<pre id="json"></pre>
<div>Method 1 pass the data via JSON to Javascript </div>
<table id="json_table">
    <tr>
        <td> Name</td>
        <td> City</td>
        <td> Age</td>
  </tr>
</table>

  <script>
    var selectvalue = document.getElementById("json_table"), test={{ name_json | tojson }};
    document.getElementById("json").textContent = JSON.stringify(test, undefined, 2);

    const keys = Object.keys(test);
    for (let i = -1; i < keys.length-1; i)

    {
        const key = keys[i++];

        console.log(key, test[key]);

        a = JSON.stringify(test["Name"][i])
        b = JSON.stringify(test["City"][i])
        c = JSON.stringify(test["Age"][i])
        const tbl = document.getElementById("json_table");
        const row = tbl.insertRow();
        const cell1 = row.insertCell();
        const cell2 = row.insertCell();
        const cell3 = row.insertCell();
        cell1.innerHTML = a;
        cell2.innerHTML = b;
        cell3.innerHTML = c;

    }

    window.open('http://127.0.0.1:5000/index_method2', '_blank');

  </script>


</body>
</html>

#how to create a html table from a python dataframe using javascript

The Final output

Method 1
Method 2

How To Check For Unwanted Characters Using Loops With Python

Estimated reading time: 3 minutes

On this website, we have posted about how to remove unwanted characters from your data and How to remove characters from an imported CSV file both will show you different approaches.

In this blog posting, we are going to approach the process by using loops within a function. Essentially we are going to pass a list and then we are going to loop through the strings to check the data against it.

Step 1 – Import the data and create a data frame

The first bit of work we need to complete is to load the data. Here we create a dictionary with their respective key-value pairs.

In order to prepare the data to be processed through the function in step 2, we then load it into a data frame.

import pandas as pd
#Create a dataset locally
data = {'Number':  ["00&00$000", '111$11111','2222€2222','3333333*33','444444444£','555/55555','666666@666666'],
        'Error' : ['0','0','0','0','0','0','0']}

#Create a dataframe and tell it where to get its values from
df = pd.DataFrame (data, columns = ['Number','Error'])

Step 2 – Create the function that checks for invalid data

This is the main piece of logic that gives the output. As you can see there is a list “L” that is fed to the function run.

One thing to note is that *l is passed to the function, as there is more than one value in the list, otherwise the program would not execute properly.

To start off we create a data frame, which extracts using a regular expression the characters we don’t want to have.

Next, we then need to drop a column that is generated with NAN values, as these are not required.

Then we updated the original data fame with the values that we found.

Just in case if there are any NAN values in this updated column “Error”, we remove them on the next line.

The main next is the loop that creates a new column called “Fix”. This holds the values that will be populated into the fix after the data we don’t want is removed and is data cleansed.

The data we do not need is in str.replace.

#Function to loop through the dataset and see if any of the list values below are found and replace them
def run(*l):
    #This line extracts all the special characters into a new column
    #Using regular expressions it finds values that should not appear
    df2 = df['Number'].str.extract('(\D+|^(0)$)') # New dataframe to get extraction we need
    print(df2)
    df2 = df2.drop(1, axis=1) # Drops the column with NAN in it, not required

    df['Error'] = df2[0] # Updates original dataframe with values that need to be removed.
    #This line removes anything with a null value
    df.dropna(subset=['Error'], inplace=True)
    #This line reads in the list and assigns it a value i, to each element of the list.
    #Each i value assigned also assigns an index number to the list value.
    #The index value is then used to check whether the value associated with it is in the df['Number'] column 
    #and then replaces if found
    for i in l:
        df['Fix']= df['Number'].str.replace(i[0],"").str.replace(i[1],"").str.replace(i[2],"").str.replace(i[3],"") \
        .str.replace(i[4],"").str.replace(i[5],"").str.replace(i[6],"")
        print("Error list to check against")
        print(i[0])
        print(i[1])
        print(i[2])
        print(i[3])
        print(i[4])
        print(i[5])
        print(i[6])
    print(df)

#This is the list of errors you want to check for
l = ['$','*','€','&','£','/','@']

Step 3 – Run the program

To run this program, we just execute the below code. All this does is read in the list “L” to the function “run” and then the output in step 4 is produced

run(l)

Step 4 – Output

Error list to check against
$
*
€
&
£
/
@
          Number Error           Fix
0      00&00$000     &       0000000
1      111$11111     $      11111111
2      2222€2222     €      22222222
3     3333333*33     *     333333333
4     444444444£     £     444444444
5      555/55555     /      55555555
6  666666@666666     @  666666666666

Process finished with exit code 0

How to Create Multiple XML Files From Excel With Python

So you have an Excel file that has hundreds or maybe thousands of records, but you want to convert it to an XML file, and have each row as a separate file?

This can be achieved very easily using Python, in three easy steps.

First of all, what is XML?

XML stands for “extensible markup language”, and is used extensively with websites, Webservices and where companies want to transfer data between each other in a predefined format that follows a recognised standard.

To understand more about what is xml, follow the link. One thing to note is that it may reference the financial services industry, but XML is used widely across multiple industries too, as we are now digitising a lot of our processes, the transfer of data needs to follow a standard that everyone can work with seamlessly.

Step 1 – Let’s look at our raw data

Below is an excel file with three columns and eight rows. The objective is to create eight separate XML files for each row.

raw data to populate multiple excel files

Step 2 – Import the data and create a master XML file

Below is the logic I used in How to Create an XML File from Excel using Python, I have attached comments where necessary to explain what each line is doing.

The purpose at this point is to create one file with all the data from excel called output.xml. We will then use this to create the separate files we need and then delete it.

The next steps will take you through that.

import pandas as pd
from lxml import etree as et
import os


# STEP 1 - This step creates a master XML file

raw_data = pd.read_excel(r'\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\how_to_read_each_excel_row_into_a_separate_XML_file\excel_raw_data_city.xlsx')

root = et.Element('root')

for row in raw_data.iterrows(): #==> This is a loop that takes runs through each record and populates for each tag.
    root_tags = et.SubElement(root, 'ExportData') #=== > Root name
# These are the tag names for each row (SECTION 1)
    Column_heading_1 = et.SubElement(root_tags, 'City')
    Column_heading_2 = et.SubElement(root_tags, 'Area')
    Column_heading_3 = et.SubElement(root_tags, 'Population')


###These are the values that will be populated for each row above
# The values inside the [] are the raw file column headings.(SECTION 2)
    Column_heading_1.text = str(row[1]['City'])
    Column_heading_2.text = str(row[1]['Area'])
    Column_heading_3.text = str(row[1]['Population'])

# This Section outputs the data to an xml file
# Unless you tell it otherwise it saves it to the same folder as the script.
tree = et.ElementTree(root) #==> The variable tree is to hold all the values of "root"
et.indent(tree, space="\t", level=0) #===> This just formats in a way that the XML is readable
tree.write('output.xml', encoding="utf-8") #==> The data is saved to an XML file

Step 3 – Loop through each row and create the separate files

In the below code, it opens the output.xml file and using the loop looks for all the data between the tags “ExportData”.

Then it copies them and then creates a new file.

Once one is done, it moves to the next set of tags and repeats the process, till it reaches the end of the output.xml file.

master_file = et.iterparse('output.xml', events=('end', ))
index = 0
for i, j in master_file:
    if j.tag == 'ExportData':
        index += 1
        filename = format(str(index) + ".xml")
        with open(filename, 'wb') as f:
            f.write(et.tostring(j))

Step 4 – Delete the output.xml file – purely optional

The objective here was to create separate XML files, based on the input of the original Excel file.

We have also created an additional file called output.xml to get us to the end goal.

If we have no need for this file then it can be removed with the below code, be sure that it is at the very end of your logic.

The import statement for os.unlink is in step two.

os.unlink(r'\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\how_to_read_each_excel_row_into_a_separate_XML_file\output.xml')

How To Compare CSV Files for Differences

Estimated reading time: 5 minutes

Often a lot of data analytics projects involve comparisons, some of it could be to look at data quality problems, other times it could be to check that data you are loading has been saved to the database with no errors.

As a result of this, there is a need to quickly check all your data is correct. But rather r than do visual comparisons, wouldn’t it be nice to use Python to quickly resolve this?

Luckily for you in this blog post, we will take you through three ways to quickly get answers, they could be used together or on their own.

Let’s look at the data we want to compare

We have two CSV files, with four columns in them:

The objective here is to compare the two and show the differences in the output.

Import the files to a dataframe.

import pandas as pd
import numpy as np
df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

The above logic is very straightforward, it looks for the files in the same folder as the python script and looks to import the data from the CSV files to the respective data frames.

The purpose of this is that the following steps will use these data frames for comparison.

Method 1 – See if the two data frames are equal

In the output for this, it shows differences through a boolean value in this instance “True” or “False”.


array1 = np.array(df1) ===> Storing the data in an array will allow the equation below to show the differences.
array2 = np.array(df2)

df_CSV_1 = pd.DataFrame(array1, columns=['No','Film','Year','Length (min)'])
df_CSV_2 = pd.DataFrame(array2, columns=['No','Film','Year','Length (min)'])

df_CSV_1.index += 1 ===> This resets the index to start at 1 not 0, helps with the output when trying to understand the differences.
df_CSV_2.index += 1

df_CSV_1.index += 1 ===> This resets the index to start at 1 not 0, helps with the output when trying to understand the differences.
df_CSV_2.index += 1

print(df_CSV_1.eq(df_CSV_2).to_string(index=True)) ===> This shows the differences between the two arrays.

Your output will look like this, and as can be seen on lines 3 and 13 are false, these are the yellow values in the CSV2 file that are different to the CSV1 file values, all other data is the same which is correct.

The obvious advantage of the below is that you can quickly what is different and on what line, but now what values are different, we will explore that in the next methods.

        No  Film   Year  Length (min)
1   True  True   True          True
2   True  True   True          True
3   True  True  False          True
4   True  True   True          True
5   True  True   True          True
6   True  True   True          True
7   True  True   True          True
8   True  True   True          True
9   True  True   True          True
10  True  True   True          True
11  True  True   True          True
12  True  True   True          True
13  True  True  False          True
14  True  True   True          True
15  True  True   True          True
16  True  True   True          True
17  True  True   True          True
18  True  True   True          True
19  True  True   True          True
20  True  True   True          True

Method 2 – Find and print the values only that are different

So in the first approach, we could see there are differences, but not what lines are different between the two files.

In the below code it will again look at the data frames but this time print the values from the CSV1 file that have different values in the CSV2 file.

a = df1[df1.eq(df2).all(axis=1) == False] ===> This compares the data frames, but only returns the rows from DF1 that have a different value in one of the columns on DF2

a.index += 1 ===>This resets the index to start at 1, not 0, which helps with the output when trying to understand the differences. 

print(a.to_string(index=False))

As a result, the output from this as expected is:

No        Film  Year  Length (min)
  3    Parasite  2019           132
 13  Midsommar   2019           148

Method 3 – Show your differences and the value that are different

The final way to look for any differences between CSV files is to use some of the above but show where the difference is.

In the below code, the first line compares the two years between the two sets of data, and then applies a true to the column if they match, otherwise a false.


df1['Year_check_to_DF2'] = np.where(df1['Year'] == df2['Year'], 'True', 'False')
df1.index += 1 #resets the index to start from one.

df2_year = df2['Year'] ===> We create this column to store the DF2 year value.
df2_year = pd.Series(df2_year) #Series is a one-dimensional labeled array capable of holding data of any type.

df1 = df1.assign(df2_year=df2_year.values) = This adds the DF2 year value to the DF1 data frame
print(df1.to_string(index=False))

In this instance, this returns the below output. As can be seen, it allows us to complete a line visual of what is different.

So in summary we have completed a comparison of what is different between files.

There are a number of practical applications of this:

(A) If you are loading data to a database table, the uploaded data can be exported and compared to the original CSV file uploaded.

(B) Data Quality – Export key data to a CSV file and compare.

(C) Data Transformations – Compare two sets of data, and ensure transformations have worked as expected. In this instance, differences are expected, but as long as they are what you programmed for, then the validation has worked.

If you like this blog post, there are others that may interest you:

How to Compare Column Headers in CSV to a List in Python

How to count the no of rows and columns in a CSV file

How to Compare Column Headers in CSV to a List in Python

Estimated reading time: 3 minutes

So you have numerous different automation projects in Python. In order to ensure a clean and smooth straight-through processing, checks need to be made to ensure what was received is in the right format.

Most but not all files used in an automated process will be in the CSV format. It is important there that the column headers in these files are correct so you can process the file correctly.

This ensures a rigorous process that has errors limited.

How to compare the headers

The first step would be to load the data into a Pandas data frame:

import pandas as pd

df = pd.read_csv("csv_import.csv") #===> Include the headers
print(df)

The actual original file is as follows:

Next we need to make sure that we have a list that we can compare to:

header_list = ['Name','Address_1','Address_2','Address_3','Address_4','City','Country']

The next step will allow us to save the headers imported in the file to a variable:

import_headers = df.axes[1] #==> 1 is to identify columns
print(import_headers)

Note that the axis chosen was 1, and this is what Python recognises as the column axes.

Finally we will apply a loop as follows:

a = [i for i in import_headers if i not in header_list]
print(a)

In this loop, the variable “a” is taking the value “i” which represents each value in the import_headers variable and through a loop checks each one against the header_list to see if it is in it.

It then prints out the values not found.

Pulling this all together gives:

import pandas as pd

df = pd.read_csv("csv_import.csv") #===> Include the headers
print(df)

#Expected values to receive in CSV file
header_list = ['Name','Address_1','Address_2','Address_3','Address_4','City','Country']

import_headers = df.axes[1] #==> 1 is to identify columns
print(import_headers)


a = [i for i in import_headers if i not in header_list]
print(a)

Resulting in the following output:

As can be seen the addresses below where found not to be valid, as they where not contained within our check list “header_list”

How to change the headers on a CSV file

Problem statement

You are working away on some data analytics projects and you receive files that have incorrect headings on them. The problem is without opening the file, how can you change the headers on the columns within it?

To start off, lets look at file we want to change , below is a screen shot of the data with its headers contained inside:

So as you can see we have names and addresses. But what it if we want to change the address1 ,address2,address3,address4 to something different?

This could be for a number of reasons:

(A) You are going to use those columns as part of an SQL statement to insert into a database table, so you need to change the headers so that SQL statement won’t fail.

(B) Some other part of your code is using that data, but requires the names to be corrected so that does not fail.

(C) Your organisation has a naming convention that requires all column names to be a particular structure.

(D) All data of a similar type has to be of the same format, so it can be easily identified.

What would be the way to implement this in Python, then?

Below you will see the code I have used for this, looking to keep it simple:

import pandas as pd
#df = pd.read_csv("csv_import.csv",skiprows=1) #==> use to skip first row (header if required)
df = pd.read_csv("csv_import.csv") #===> Include the headers
correct_df = df.copy()
correct_df.rename(columns={'Name': 'Name', 'Address1': 'Address_1','Address2': 'Address_2','Address3': 'Address_3','Address4': 'Address_4'}, inplace=True)
print(correct_df)
#Exporting to CSV file
correct_df.to_csv(r'csv_export', index=False,header=True)

As can be seen there are eight rows in total. The steps are as follows:

  1. Import the CSV file .

2. Make a copy of the dataframe.

3. In the new dataframe, use the rename function to change any of the column headers you require, Address1, Address2, Address3, Address4.

4. Once the updates are completed then re-export the file with the corrected headers to a folder you wish.

As a result of the above steps, the output will appear like this:

And there you go. If you had an automated process, you could incorporate this in to ensure there was no failures on the loading of any data.

Another article that may interest you? How to count the no of rows and columns in a CSV file

How to save data frame changes to a file

Estimated reading time: 2 minutes

Changing a file is the natural step; tracking those changes are just as important.
Change is part and parcel of life, but in the technology world with the complexity  and interdependency of systems, not effectively been able to track what goes on leads to:

  • Countless hours are trying to figure out where it went wrong.
  • You do not understand what needs fixing.
  • Systems/processes that ultimately work seamlessly, slow down unnecessarily.

In data analysis, as the volumes can be quite large, the human cannot feasibly review a set of data and find out where the underlying problem is. Well, they can, but it would take so long, nothing else would get done. This article by Forbes – predictions-about-data-in-2020-and-the-coming-decade predicts the consumption of data will just be getting bigger.

 

Let the script work, see the log of changes in the output.

How do you remove characters from an imported CSV file, looked at some data cleansing techniques, but there was no way of knowing what was changed other than a visual inspection. Here we introduce the data set into a data frame, change some of the values and show the output on the screen. But more importantly, as we progress through these steps, we are saving the changes as we go along.

The reality is that in large corporate settings, visual inspections would take up too much time and resources. An IT solution to help with giving the vital information required will reduce the data errors happening and allow for a more unobstructed view of how the companies data has changed over time.

 

Where does the trail lead to next?

  • Changes made to data needs a clear way of being able to be tracked.
  • How you captured those changes on your systems, needs to be addressed.
  • Implementing better systems will help you have confidence in your data changes.

Showing audit trail of changes

hide a column from a data frame

Estimated reading time: 2 minutes

They say there is nowhere to hide, we disagree!
As an addition to How to add a column to a dataframe would you like to learn to go and hide it?! This video has several steps in it; following each one will give you a good introduction.

To start why you would like to hide a column?

  • You may not want to reveal its output as it is sensitive information.
  • The data in the column is not in the correct format, you will want to repurpose it, so it is the way you want it.
  •  The column could be a calculated column. Hence it serves as an intermediary step before your data frame is output.

Finding the best way to hide unwanted data:

In this video, we introduce several concepts to help not show a column:

  • Specify the actual columns you want to include in the data frame, by default doing this you are excluding the column or columns you don’t want to see.
  •  We use drop, to explicitly tell the data frame not to show a particular column.
  •  Also, we display a scenario whereby you have a calculated column but do not want to show its output, based on one of the reasons outlined above.
  • Finally, the index of the column can appear in the output, so we have shown through set_index how to hide it from what is displayed.

This latest in the Python Dataframe series looks to build on the knowledge in the previous examples. We hope as you learn python online, it will increase your programming skills.

Thanks for watching and don’t forget to like and share through our social media buttons to the right of this page.

Data Analytics Ireland

Python Tutorial: Add a column to a data frame

Estimated reading time: 1 minute

You have learnt addition, now learn how to add a column to a data frame!
 
In our last post on what are Python data frames, we introduced the concept,  but are you now searching how to add a column to a data frame?
 
To start, I was working away and wondering how I could accomplish this, as there were many posts about it.
 
Searching through the jungles of website articles, some topics of interest gave me ideas whereas follows:
(B) Lambda
(C) Numpy

 

Having tested the waters to see how you can approach:

After working through the above to:

  • Figure out how to use them.
  • Write some code to see how it all comes together.

The best thing to do was to put code into action!

Python for beginners or advanced programmers does not have to be hard!

Remember to subscribe to our channel if you like what we do!

Support is on the way:

TutorialsPoint is an excellent resource if looking to understand some other examples see this post here: TutorialsPoint: Add Column

To see a related post on how to hide a column from a data frame look no further How to hide a column from a data frame

Data Analytics Ireland

 

YouTube channel lists – Python DataFrames

Estimated reading time: 1 minute

Welcome to this new blogging website! We are all about data analytics to have a look at this page here About Data Analytics Ireland

To keep it simple we have created some lists here and on our YouTube Channel

As we progress over the next while, the website will be updated as we go along, and while there may be a  lot of video content, we will look to mix it up with different formats.

We have started with Python Data frames :

We hope you enjoy and don’t forget if you like what we are doing subscribe to our channel!

Data Analytics Ireland