Python tutorial: Pandas groupby ( Video 1)

In this first video about pandas groupby and as part of expanding the data analytics information of this website, we are looking to explain how you can use a groupby selection to sort your data into similar datasets better so they can be better analysed. In the video below, we import our data into a dataframe, and then group as follows:

  • Directly naming the column
  • Through get_group
  • Using a loop
  • Utilising a lambda function

 

 

Regular expressions python

Regular expressions explained

Regular expressions are a set of characters usually in a particular sequence that helps find a match/pattern for a specific piece of data in a dataset.

The purpose is to allow a uniform of set characters that can be reused multiple times, based on the requirements of the user, without having to build each time.

The patterns are similar to those that you would find in Perl.

How are regular expressions built?

To start, in regular expressions, there are metacharacters, which are characters that have a special meaning. Their values are as follows:

. ^ $ * + ? { } [ ] \ | ( )

.e = All occurrences which have one “e”, and value before that e. There can be multiple e, eg ..e means check two characters before e.

^ =Check if a string starts with a particular pattern.

*  = Match zero or more occurrences of a pattern, at least one of the characters can be found.

+ = Looks to match exact patterns, one or more times, and if they are not precisely equal, then nothing is returned.

? =Check if a string after ? exists in a pattern and returns it. If a value before the ? is directly beside the value after ? then returns both values.

—> e.g. t?e is the search pattern. “The” is the string. The result will return only the value e, but if the string is “te”, then it will return te, as the letters are directly beside each other.

da{2} = Check to see if a character has a set of other characters following it. E.g. sees if d has two “a” following it.

[abc] = These are the characters you are looking for in the data. Could also use [a-c] and will give you the same result. Change to uppercase to get only those with uppercase.

\ = Denoting a backslash used to escape all metacharacters, so if they need to be found in a string, they can be. Used to escape $ in a string so they can be found as a literal value.

| = This is used when you want an “or” operator in the logic, i.e. check for one or more values from a pattern, either or both can be present.

() = Looks to group pattern searches or a partial match, to see if they are together or not.

 

Special sequences, making it easier again

\a = Matches if the specified characters are at the start of the string been searched.

\b = Matches if the specified characters are at the beginning or the end of the string been searched.

\B = Matches if the specified characters are NOT at the beginning or the end of the string been searched.

\d = Matches any digits 0-9.

\D = Matches any character is not a digit.

\s = Matches where a string contains a whitespace character.

\S = Matches where a string contains a non-whitespace character.

\w = Matches if digits or character or _ found

\W = Matches if non-digits and or characters or _found

\z = matches if the specified characters are at the end of the string.

 

 

For further references and reading materials, please see the below websites, the last one is really useful in testing any regular expressions you would like to build:

See further reading material here: regular expression RE explained

Another complementary page to the link above regular expression REGEX explained

I found this link on the internet, and would thoroughly recommend you bookmark it. It will also allow you to play around with regular expressions and test them before you put into your code, a very recommended resource Testing regular expressions

 

What are the reserved keywords in Python

What are python reserved keywords?

When coding in the Python language there are particular python reserved words that the system uses, which cannot be accessed as a variable or a function as the computer program uses them to perform specific tasks.

When you try to use them, the system will block it and throws out an error. Running the below code in Python

import keyword
keywordlist = keyword.kwlist
print(keywordlist)

Produces the below keyword values
['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del',
'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal',
'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']

When writing your code, it is important to follow the following guidelines:

(A) Research the keywords first for the language you are writing in.

(B) Ensure that your programming language highlights keywords when used, so you can fix the issue.

(C) Setup your computer program in debug mode to highlight keywords use.

With some programs running into thousands of lines of code, with additional functions and variables, it can become harder to spot the problem, so good rigour in the initial stages of coding will help down the road any issues that you may find that need to fixed.

This code was run in Python version 3.8

Python tutorial: Create an input box in Tkinter

Using an tkinter input box for your data projects

There may be an occasion as you are building out a data science or data analytics project, checks need to be performed on the dataset as follows:

  •  Big data sets and speed requirements in conjunction with
  • The need to reduce the volume of data returned which is impeding performance

and this is where input boxes and Tkinter can help!

In the below video, we are demonstrating an introduction to using an input box and validating the input.

We demonstrate how to validate the data entered into the tkinter input box and return a message, this will ensure the user gets the correct data.

Types of uses for a tkinter input box are varied, here are some thoughts:

  • Use an input box to return a set of data for a particular day.
  • Using them to filter down the results to a particular cohort of data.
  • Conduct a string search to find data quality issues to be fixed.

Python tutorial: How to create a graphical user interface in tkinter

How would you like to present your data analytics work better?

When starting your data analytics projects, one of the critical considerations is how to present your results quickly and understandably?

Undoubtedly this is true if you are only going to look at the results yourself.

If the work you do is a repeatable process, a more robust longer-term solution needs to be applied, this is where Tkinter can help, which is a python graphical user interface.

There are many applications for using Tkinter, such as:

  • Use them to build calculators.
  • They can show graphs and bar charts.
  • Show graphics on a screen.
  • Validate user input.

Where this all fits in with data analytics?

While going through a set of data and getting some meaning to it can be challenging, using the python graphical user interface tutorial below can help build the screens that will allow a repeatable process to display in a meaningful way.

Ultimately, you could do the following:

  • Build a screen that shows data analytics errors in a data set, e.g. The number of blank column values in a dataset.
  • Another application is to run your analytics to show the results on a screen that can be printed or exported.
  • Similarly, you could also have a screen where a user selects several parameters that are fed into the data analytics code and produces information for the user to analyse.

There are many more ways that you could do this, but one of the most important things is that data analytics can be built into a windows environment using Tkinter that the user would be used to seeing. As a result, this could help to distribute a solution across an enterprise to lots of different users.

The only thing that needs to happen is that the requirements the user needs are defined, and the developer then builds on those, with the data analytics code run in the background of this program with Tkinter and output into a user-friendly screen for review.

 

How to create a combobox in tkinter

Using a Combobox in Tkinter

Here we have delivered a complimentary video to How to create a graphical user interface in Tkinter, demonstrating how a Combobox can be used to select values and then validate the entry chosen.

Using a Combobox in the computer programming world has been in use for some time. It is a useful way to select from a choice and could in many ways in data analytics help as the following examples show:

  • Select a date to filter a data set down to values that are in the dataset at that point.
  • Using matplotlib to plot data points in charts, you could have dynamic values that change the diagram based on values chosen from the Combobox.
  • Utilising data analytics reports that the user accesses, the Combobox could be used to change the data shown dynamically to allow comparisons.
  • When looking to fix data quality issues, use the Combobox to select values for a date that needs to be fixed, apply the fixes on screen and then save back to the database.

Developing a Tkinter GUI and the possibilities it brings

In this video, we use ttk, written to help split the behaviour of code from the code implementing its appearance; you can see plenty more on it here ttk information. You will find this a handy piece of functionality so that styling an object will not interfere with how it works.

We also have a function that helps with the validation as follows:

def checkifireland ():
    x = combolist.get() # asssigns the value inside the combobox to x so it can be processed
    if x == "Ireland":
        messagebox.showinfo("Correct answer", "You will love it in Ireland")
    else:
        messagebox.showinfo("Incorrect answer", "You should visit Ireland first!")

The effectiveness is especially handy as it helps to ensure that the code returned from the Combobox to the function is correct, as the below video will show.

The next steps

There are many informative Python – working with excel videos which are on our YouTube channel.

We are looking to bring them in and show on a graphical user interface tutorial.

If you subscribe to the channel, you will get to see those videos as they are uploaded.

how to remove unwanted characters from your data

Using r programming functions to cleanse data
In our recent post  Python Tutorial: How do I remove unwanted characters  we walked through the concepts behind data cleansing.

We demonstrated several different approaches to data cleansing, and the use of regular expressions was also shown.

Here we look at that approach using rstudio and the following functions:

How to approach data cleansing

In removing unwanted characters, you want to ensure that you have a defined list of what should not appear and will cause you errors. It is also essential to understand the type of data errors that can occur as follows:

  • Data entry errors
  • Data columns have the incorrect format, e.g. Telphone numbers which have non-numerical characters in them
  • Missing data that is required – e.g. null values
  • Data that does not make sense, e.g. date of birth that is beyond the range of what you would typically expect to see
  • Duplicate values for the same piece of data. The problem here is that this can inflate the no of data errors, and not give a true count of the actual errors.

In the below video we utilise r programming code using the above functions, but also use an if statement to check if an unwanted character is in the data set first before proceeding to remove it and return the cleansed data.

Some of the strategies to help counteract data errors could include:

  • Eliminate manual inputs
  • Controls at point of entry for data, e.g. for dates only allow date formats in the field.
  • Reduce duplication of the data across multiple systems, reduces the no of places that data differences can occur.
  • If integrating different systems with the same data into one network, perform a data cleanse beforehand, reduces the work needed afterwards to clean up the problems that brings.

 

tkinter python tutorial

Let’s make the introductions 🙂
Tkinter is a package that allows a programmer to build a GUI interface, which then can be opened on a computer screen by a user. There are many different types of GUI apps, but examples include a calculator or a text editor that opens when you click it.

Tkinter would be the most commonly used GUI package in Python, due to its simplicity, but PySimpleGUI, PYQt or PySide are other alternatives. Ensure you research these before using to make sure they suitable for your needs.

Why use Tkinter?

  • Relatively simple and easy to learn, upskilling is quick.
  • A great introduction to the concepts and ideas for building GUI apps, you will get a good grounding in the techniques and approaches needed.
  • Very well documented, so a programmer should be able to find the answer to anything specific they need to understand.

 

Now we are introduced, let’s see how to utilise it:

Install Python as usual, and make sure that tkinter is working and you have the correct version. Note that import tkinter is for version 3.x, before that use import Tkinter

When saving your python script DO NOT call it tkinter.py as I did, the import statement will not work. Call it something like tkinter_test.py, see red arrow below.

 

At the start of the video below the code will look like this:

Added to this code in the following video:

  • Button – which will open our YouTube channel
  • An image
  • A clickable link – Which will bring you to our Home Page

A screenshot of the final output is as follows:

See a link to the Python documentation here Tkinter on python.org

Recursion

What is recursion?
To start a recursion is looking to solve a problem, by breaking it down into smaller chunks, which contribute to the final answer.

If it is calling itself, then it is commonly known as a recursive function.

By breaking down the problem at hand, it can then lead to a quicker understanding and solution to the problem.

A simple example might help here, explaining through factorial:

4! = 4*3*2*1 = 24
simple factorial example

From studying maths at school, the simple example above breaks down as:

  • The result 24 is a product of all the values from 4 down to one.
  • As each value that makes it up is known, it then can be seen how the result is made up.
  • The above can be viewed as a recursive function as it keeps repeating on itself until it reaches one, the base.
  • The recursion knows there are three steps before it reaches one; this is where the breakdown into smaller chunks comes in.

So it could be broken down as follows:

Step 1
4*3 ---> Not reached one, try again from the start.
Step 2
4*3*2 ---> Not reached one, try again from the start.
Step 3
4*3*2*1 ---> Reached one, so the recursion stops and the result is outputted.

Steps 1 to 3 are the steps within the function it takes until it reaches the base value, where the function stops and outputs the final value calculated.

Attributes of a recursion

  • A function which calls on itself, this is the repeated steps above until it reaches the base of 1, meaning it doesn’t loop infinitely.
  • Must be possible to break the problem down into smaller parts.
  • As the problem gets broken down, it must become easier to solve without further calculations.
  • Once a smaller part has calculated, this just becomes part of the answer to the overall problem.

Things to watch out for when using recursion

  • Ensure you always have a base value; without it, you could encounter an infinite loop.
  • Not having the ability to break up the problem into smaller steps won’t allow the calculation of the final answer.

Why use recursion?

  • It helps to break up a complicated task into smaller bits.
  • Assists a programmer to see what steps have already coded for so can be solved with other functions already written.
  • Where there are multiple recursive functions, allows to see if similarities in steps, hence only need to programme once for them.

A good source to provide further knowledge can be found here Wikipedia – Recursion

Data analytics Ireland

R Tutorial: How to pass data between functions

Using functions in R, quite simple!

When starting to look at functions and having tested them in Python and Javascript, it was quickly apparent how programming languages are so similar.

Except for the syntax you use in each; the programming is quite similar.

The purpose of this video is to:

  • Start on using functions from the ground up.
  • Don’t over-complicate the example; keep it easy enough to follow.

How to write the code to pass data between functions

As this is a short video, the code that went into making it is pretty straight forward

# create a function
function.a <- function(){
  newvarb <- 2
}

function.b <- function(){
  newvarb <- function.a()*2 # this takes in the value of function a and multiplies it by two
}
print(function.b()) # Prints out the value of function b

Below is the video that will take you through each line, and show the output that we are looking to achieve

How can we use this in our projects

No matter what programming language you use or choose to learn, the concept of functions will appear in some shape or form. Their ability to quickly run a repeatable process and return a value, which can be called from anywhere in a program allows the programmer to reduce their coding time swiftly and reduce repetitive tasks that only need to run once.

This video has an equivalent in Python, and you can see it here  Python Functions – passing data between them

Data Analytics Ireland