How to run Python directly from Javascript

Estimated reading time: 3 minutes

Are you developing a website and looking to execute some Python script directly from the website?

Here we take you through how to achieve this using Python Flask, the scenarios we demonstrate are as follows:

  1. Javascript on load event – execute Python script
  2. Javascript on click event – execute a Python script

In both scenarios above you will be presented with pop-up boxes like below, but the python code can be changed to whatever you like.

Page load message box
On button press, calls the python script to load a message box

The code you will need to run this is split into two. (A) The Index.HTML logic and (B) the app.py logic

The code for the INDEX.HTML page

This is the page that the user is presented with. As you can see the logic has two pieces of javascript that when run, go over to the python code and run some logic that it wishes the webpage to run.

In this instance, the app.py python script holds the commands that need to be executed.

Note the page load event goes to the route in app.py, namely ” @app.route(‘/’)”

Whereas the button click event is pointed directly at “@app.route(‘/test’ )” – essentially this will not run until the javascript asks it to, on the button click event.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Sample Home Page</title>
</head>
<body>
<! -- Calls a function on the page load. -- >
<element onload="myfunction_onload">
<button type="submit" onclick='myfunction_clickevent()'>Run my Python!</button>
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.5/jquery.min.js"></script>
<! -- This script runs the python script on the page load -->
<script>
    function myfunction_onload(){
        $.ajax({
            url: "app.py",
             context: document.body
            })
        }
    </script>
<! -- This script runs the python script when the button is clicked -->
<script>
    function myfunction_clickevent(){
        $.ajax({
            url:"/test",
            context: document.body});}
</script>


</body>
</html>
How to run Python directly from Javascript

The code for the APP.PY Python script

Here we are linking to the index.html page and taking commands from it and executing those requests.

In this scenario, the message boxes are rendered by using the win32api package.

The app is using the flask package to run the website.

The powerful thing here is you can start customising this logic to do what you like, examples include:

(A) Return graphs to the webpage.

(B) Process data received and return statistics on the data.

(C) Validate data received in Python and return a response. An example here could be a user logging into a database.

import win32api
from flask import Flask, render_template

app = Flask(__name__)


#Using the below, the popup message appears on the page load of index.html
#0x00001000 - This makes the popup appear over the browser window
@app.route('/')
def index():
    win32api.MessageBox(0, 'You have just run a python script on the page load!', 'Running a Python Script via Javascript', 0x00001000)
    return render_template('index.html')

#Using the below, the popup message appears when the button is clicked on the webpage.
#0x00001000 - This makes the popup appear over the browser window
@app.route('/test')
def test():
    win32api.MessageBox(0, 'You have just run a python script on the button press!', 'Running a Python Script via Javascript', 0x00001000)
    return render_template('index.html')

if __name__ == "__main__":
    app.run(debug=True)

ValueError: cannot convert float NaN to integer

Estimated reading time: 2 minutes

Sometimes in your data analytics project, you will be working with float data types and integers, but the value NaN may also appear, which will give you headaches if you don’t know how to fix the problem at hand.

A NaN is defined as “Not a Number” and represents missing values in the data. If you are familiar with SQL, it is a similar concept to NULLS.

So how does this error occur in Python?

Let’s look at some logic below:

NaN =float('NaN')
print(type(NaN))
print(NaN)

Result:
<class 'float'>
nan

As can be seen, we have a variable called ‘NaN’, and it is a data type ‘Float’

One of the characteristics of NaN is that it is a special floating-point value and cannot be converted to any other type than float; thus, when you look at the example below, it shows us this exactly and why you would get the error message we are trying to solve.

NaN =float('NaN')
print(type(NaN))
print(NaN)

a= int(NaN)

print(a)

Result:

Traceback (most recent call last):
  File "ValueError_cannot_convert_float_NaN_to_integer.py", line 5, in <module>
    a= int(NaN)
ValueError: cannot convert float NaN to integer

In the variable ‘a’ we are trying to make that an integer number from the NaN variable, which, as we know, is a floating-point value and cannot be converted to any other type than float.

How do we fix this problem?

The easiest way to fix this is to change the ‘NaN’ actual value to an integer as per the below:

NaN =float(1)
print(type(NaN))
print(NaN)

a= int(NaN)

print(a)
print(type(a))

Result:
<class 'float'>
1.0
1
<class 'int'>

So, in summary, if you come across this error:

  1. Check to see if you have any ‘Nan’ values in your data.
  2. If you do replace them with an integer value, or a value that you need for your project, that should solve your problem.

TypeError: List Indices Must Be Integers Or Slices, Not Tuple

Estimated reading time: 3 minutes

When working with Python lists in your data analytics projects, when you trying to read the data, a common problem occurs if you have a list of lists, and it is not properly formatted.

In this instance, Python will not be able to read one or more lists and as a result, will throw this error.

In order to understand how this problem occurs, we need to understand how to create a list of lists.

How to create a lists of lists

Let’s look at a simple list:

a = [1,2,3]
print(a)
print(type(a))

Result:
[1, 2, 3]
<class 'list'>

Let’s create a second list called b:

b = [4,5,6]
print(b)
print(type(b))

Result:
[4, 5, 6]
<class 'list'>

So if we want to join the lists together into one list ( hence a list of lists) then:

a = [1,2,3]
b = [4,5,6]

list_of_lists = []
list_of_lists.append(a)
list_of_lists.append(b)
print(list_of_lists)
print(type(list_of_lists))

Result:
[[1, 2, 3], [4, 5, 6]]
<class 'list'>

So as can be seen the two lists are contained within a master list called “list_of_lists”.

So why does the error list indices must be integers or slices, not tuple occur?

Reason 1 ( Missing commas between lists)

If you manually type them in and forget the comma between the lists this will cause your problem:

a=[[1,2,3][4,5,6]]
print(a)

Result (Error):
Traceback (most recent call last):
  line 10, in <module>
    a=[[1,2,3][4,5,6]]
TypeError: list indices must be integers or slices, not tuple

But if you put a comma between the two lists then it returns no error:

a=[[1,2,3],[4,5,6]]
print(a)

Result (no error):
[[1, 2, 3], [4, 5, 6]]
Process finished with exit code 0

Reason 2 ( Comma in the wrong place)

Sometimes you have a list, and you only want to bring back some elements of the list, but not others:

In the below, we are trying to bring back the first two elements of the list.

a= [1,2,3,4,5]
print(a[0:,2])

Result:
Traceback (most recent call last):
   line 14, in <module>
    print(a[0:,2])
TypeError: list indices must be integers or slices, not tuple

The reason that the same error happens is the additional comma in a[0:,2], causes the error to appear as Python does not know how to process it.

This is easily fixed by removing the additional comma as follows:

a= [1,2,3,4,5]
print(a[0:2])

Result:
[1, 2]
Process finished with exit code 0

So why is there a mention of tuples in the error output?

The final piece of the jigsaw needs to understand why there is a reference to a tuple in the error output?

If we return to a looking at a list of lists and look at their index values:

a=[[1,2,3],[4,5,6]]
z = [index for index, value in enumerate(a)]
print(z)

Result:
[0, 1]
Process finished with exit code 0

As can be seen, the index values are 0,1, which is correct.

As before removing the comma gives the error we are trying to solve for:

a=[[1,2,3][4,5,6]]
z = [index for index, value in enumerate(a)]
print(z)

Result:
Traceback (most recent call last):
  line 16, in <module>
    a=[[1,2,3][4,5,6]]
TypeError: list indices must be integers or slices, not tuple

BUT the reference to the tuple comes from the fact that when you pass two arguments (say an index value) a Tuple is created, but in this instance, as the comma is missing the tuple is not created and the error is called.

This stems from the __getitem__ for a list built-in class cannot deal with tuple arguments that are not integers ( i.e. 0,1 like we have returned here) , so the error is thrown, it is looking for the two index values to pass into a tuple.

How to group your data in Tableau

Estimated reading time: 3 minutes

Have you learnt how to connect to your data in Tableau and now want to understand how to group your Tableau data?

Here we go through a number of steps to help you understand better how to approach this, and what benefits it will bring to your data analytics project.

Why would I group in Tableau?

When you are working with large data sets , sometimes it easier to understand its meaning when the data is stored with simialr data items. Grouping the data has the following benefits:

(A) It allows a quick summary of data, and how large that data set is.

(B) Also groupings can alert to small subsets of data you may have not been aware of.

(C) Another benefit is that groups can be shown that have errors, and fixing them will put them in with the correct data.

(D) You can visually see groups, using Tableau will then you to keep them together.

Grouping by using a field in the data pane

The main way to group is when you are in the data pane, right click on the field you want to group by , then click create group.

For this example we can choose a number of values within channel, that we want to group by, here we pick all the items that have the value web.

You will notice that even before we click apply, it shows there are some data quality issues around the name that they are not consistent. You could use this to run metrics to catch these problems and count the no that occur.

When they are fixed then these should not appear anymore.

The output of this appears like this:

And on the screen , with the grouping now assigned, everything for Channel with web in it, is on one area:

Finally sometimes within your group, you may want an “other” category. The purpose of this is to catch items that dont fall into the group you have assigned, and sometimes they may come in later to the dataset as it expands.

You can achieve this as follows:

Giving in the output:

So in summary grouping can help you to identify a no of similar items to keep together, and also it is very useful to track data quality items as they arise and are fixed.

What does a data analyst do?

Estimated reading time: 4 minutes

Livestream #2 – What does a data analyst do?

You are probably sitting there hearing about big data and databases, data analytics and machine learning and wonder where a data analyst fits in?

Here we will look to break it down step by step.

Sometimes a data analyst can be confused with a business analyst; there are subtle differences:

  • Business Analyst: Their role is to document the user’s requirements in a document that is descriptive of what the user wants.
    • In this case, a document that all parties can agree to is created, and it can be used as part of the project sign off.
  • Data Analyst: On the other hand, a data analyst will take the business requirements and translate them into data deliverables.
    • They use the document to ensure the project has the right data to meet the project objectives in the right place at the right time.

Data Mapping

In different data projects there will be a need to reconcile the data between systems, a data analysis will help here.

In a data mapping exercise, the data analyst will be expected to look at one or more sources and map them to a destination system.

  • This ensures a match between the two datasets.
  • Which results in the ability to reconcile the two systems.
  • Allows the ability to use data in multiple systems, knowing the consistency is in place.
  • Consistency of the data types between the systems.
  • It ensures that data validation errors are kept to a minimum.

Often a Data Analyst will build a traceability matrix, which tracks the data item from creation through to consumption.

Data Quality

In most companies, there will be teams (depending on their size) dedicated to this, and their input will be pivotal to existing and future data use.

It is an important task that could impact internal and external reporting and a company’s ability to make decisions accurately.

Some of the areas that might be looked at include:

(A) Investigate duplicate data – There could be a number of reasons this has to be checked:

  • Data manually entered multiple times.
  • An automated process ran multiple times.
  • A change to an IT system has unknowingly duplicated data.

(B) Finding errors – This could be completed in conjunction with data reporting outlined below.

  • Normally companies will clearly have rules that pick up the data errors that are not expected.
  • A data analyst will analyse why these errors are occurring.

(C) Checking for missing data.

  • Data feeds have failed. A request to reload the data will be required.
  • Data that was not requested as part of the business requirements confirm that this is the case.

(D) Enhancing the data with additional information – Is there additional information that can be added that can enrich the dataset?

(E) Checking data is in the correct format – There are scenarios where this can go wrong, and example is a date field is populated with text.

Data Reporting

In some of the areas above, we touched on the importance of the quality of data.

Ultimately there may be a need to track:

  • Data Quality – Build reports to capture the quality of data based on predefined business measurements.
  • Real-time Reporting – No new customers or customers who have left an organisation.
  • Track Targets – Is the target set by the business been met daily, weekly, monthly?
  • Management Reporting – Build reports that provide input to management packs that provide an overview of how the business performs.

Data Testing

Organisations go through change projects where new data is being introduced or enhanced.

As a result the data analyst will have a number of tasks to complete:

  • Write Test Scripts – Write all scripts for record counts, transformations and table to table comparisons.
  • Datatype Validation – Ensures all new data will be the same as the other data where it is stored.
  • No loss of data – Check all data is imported correctly with no data truncated.
  • Record count – Write an SQL script that would complete a source to the destination reconciliation.
  • Data Transformation – Ensure any transformations are applied correctly.

Supporting data projects

Ad hoc projects are common , and sometimes become a priority for businses as they deal with requirements that result as part of an immediate business need.

Data Analysts will be called upon to support projects where there is a need to ensure the data required is of a standard that meets the project deliverables:

Some common areas where this might occur includes:

  • Extract data where it has been found to have been corrupted.
  • Investigate data changes, to analyse where a data breach may have occurred.
  • An external regulatory body has requested information to back up some reports submitted.
  • A customer has requested all the company’s information on them; usually the case for a GDPR request.

Python Tutorial: Pandas groupby columns ( video 2)

Pandas groupby using column values

In this second video how to groupby using pandas and as part of expanding the data analytics information of this website, we are looking to explain how you can use a groupby selection but only using the column values and not the column names.

Below we import our data into a dataframe, and then group as follows:

  • Aggregate function
  • Using the cut function and assigning values to bins.
  • Assigning labels to the data frame output based on the bin values.

Why would you want to use Pandas groupby and column values?

This video looks to help understand why going by values might be easier than column names:

  • Column names can change from project to project, using by values allows easy implementation of getting the output regardless of the names used.
  • You could apply this to any Python class, and as long as you can inherit will allow the code to run smoothly.
  • Implementing by value allows a clear understanding of the desired output as the values are clearly understood to generate what is required.
  • You need to understand how data within your data set falls within a particular cohort:
    • This use of values in different programs just needs to change, the underlying logic remains the same.
    • Using column names still means that to group them, the logic still needs to be written.

Regular expressions python

Estimated reading time: 3 minutes

Regular expressions explained

Regular expressions are a set of characters usually in a particular sequence that helps find a match/pattern for a specific piece of data in a dataset.

The purpose is to allow a uniform of set characters that can be reused multiple times, based on the requirements of the user, without having to build each time.

The patterns are similar to those that you would find in Perl.

How are regular expressions built?

To start, in regular expressions, there are metacharacters, which are characters that have a special meaning. Their values are as follows:

. ^ $ * + ? { } [ ] \ | ( )

.e = All occurrences which have one “e”, and value before that e. There can be multiple e, eg ..e means check two characters before e.

^ =Check if a string starts with a particular pattern.

*  = Match zero or more occurrences of a pattern, at least one of the characters can be found.

+ = Looks to match exact patterns, one or more times, and if they are not precisely equal, then nothing is returned.

? =Check if a string after ? exists in a pattern and returns it. If a value before the ? is directly beside the value after ? then returns both values.

—> e.g. t?e is the search pattern. “The” is the string. The result will return only the value e, but if the string is “te”, then it will return te, as the letters are directly beside each other.

da{2} = Check to see if a character has a set of other characters following it. E.g. sees if d has two “a” following it.

[abc] = These are the characters you are looking for in the data. Could also use [a-c] and will give you the same result. Change to uppercase to get only those with uppercase.

\ = Denoting a backslash used to escape all metacharacters, so if they need to be found in a string, they can be. Used to escape $ in a string so they can be found as a literal value.

| = This is used when you want an “or” operator in the logic, i.e. check for one or more values from a pattern, either or both can be present.

() = Looks to group pattern searches or a partial match, to see if they are together or not.

 

Special sequences, making it easier again

\a = Matches if the specified characters are at the start of the string been searched.

\b = Matches if the specified characters are at the beginning or the end of the string been searched.

\B = Matches if the specified characters are NOT at the beginning or the end of the string been searched.

\d = Matches any digits 0-9.

\D = Matches any character is not a digit.

\s = Matches where a string contains a whitespace character.

\S = Matches where a string contains a non-whitespace character.

\w = Matches if digits or character or _ found

\W = Matches if non-digits and or characters or _found

\z = matches if the specified characters are at the end of the string.

 

 

For further references and reading materials, please see the below websites, the last one is really useful in testing any regular expressions you would like to build:

See further reading material here: regular expression RE explained

Another complementary page to the link above regular expression REGEX explained

I found this link on the internet, and would thoroughly recommend you bookmark it. It will also allow you to play around with regular expressions and test them before you put into your code, a very recommended resource Testing regular expressions

 

What are the reserved keywords in Python

What are python reserved keywords?

When coding in the Python language there are particular python reserved words that the system uses, which cannot be accessed as a variable or a function as the computer program uses them to perform specific tasks.

When you try to use them, the system will block it and throws out an error. Running the below code in Python

import keyword
keywordlist = keyword.kwlist
print(keywordlist)

Produces the below keyword values
['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del',
'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal',
'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']

When writing your code, it is important to follow the following guidelines:

(A) Research the keywords first for the language you are writing in.

(B) Ensure that your programming language highlights keywords when used, so you can fix the issue.

(C) Setup your computer program in debug mode to highlight keywords use.

With some programs running into thousands of lines of code, with additional functions and variables, it can become harder to spot the problem, so good rigour in the initial stages of coding will help down the road any issues that you may find that need to fixed.

This code was run in Python version 3.8

Python tutorial: Create an input box in Tkinter

Using an tkinter input box for your data projects

There may be an occasion as you are building out a data science or data analytics project, checks need to be performed on the dataset as follows:

  •  Big data sets and speed requirements in conjunction with
  • The need to reduce the volume of data returned which is impeding performance

and this is where input boxes and Tkinter can help!

In the below video, we are demonstrating an introduction to using an input box and validating the input.

We demonstrate how to validate the data entered into the tkinter input box and return a message, this will ensure the user gets the correct data.

Types of uses for a tkinter input box are varied, here are some thoughts:

  • Use an input box to return a set of data for a particular day.
  • Using them to filter down the results to a particular cohort of data.
  • Conduct a string search to find data quality issues to be fixed.

Python tutorial: How to create a graphical user interface in Tkinter

How would you like to present your data analytics work better?

When starting your data analytics projects, one of the critical considerations is how to present your results quickly and understandably?

Undoubtedly this is true if you are only going to look at the results yourself.

If the work you do is a repeatable process, a more robust longer-term solution needs to be applied, this is where Tkinter can help, which is a python graphical user interface.

When you are importing tkinter, some of the functionality that can be used include:

  • Use them to build calculators.
  • They can show graphs and bar charts.
  • Show graphics on a screen.
  • Validate user input, through building entry widgets.

Where this all fits in with data analytics?

While going through a set of data and getting some meaning to it can be challenging, using the python graphical user interface tutorial below can help build the screens that will allow a repeatable process to display in a meaningful way.

Using the tkinter widget could help achieve the following:

  • Build a screen that shows data analytics errors in a data set, e.g. The number of blank column values in a dataset.
  • Another application is to run your analytics to show the results on a screen that can be printed or exported.
  • Similarly, you could also have a screen where a user selects several parameters that are fed into the data analytics code and produces information for the user to analyse.

There are many more ways that you could do this, but one of the most important things is that data analytics can be built into a windows environment using Tkinter.

These GUI applications are what the user would be used to currently seeing. As a result, this could help to distribute a solution across an enterprise to lots of different users.

Also, another benefit is that they will work on many different operating systems.

The only thing that needs to happen is that the requirements the user needs are defined, and the developer then builds on those, with the data analytics code run in the background of this program with Tkinter and output into a user-friendly screen for review.