Python Tutorial: Pandas groupby columns ( video 2)

Pandas groupby using column values

In this second video how to groupby using pandas and as part of expanding the data analytics information of this website, we are looking to explain how you can use a groupby selection but only using the column values and not the column names.

Below we import our data into a dataframe, and then group as follows:

  • Aggregate function
  • Using the cut function and assigning values to bins.
  • Assigning labels to the data frame output based on the bin values.

 

 

Why would you want to use Pandas groupby and column values?

This video looks to help understand the why going by values might be easier than column names:

  • Column names can change from project to project, using by values allows easy implementation of getting the output regardless of the names used.
  • You could apply this to any Python class, and as long as you can inherit will allow the code to run smoothly.
  • Implementing by value allows a clear understanding of the desired output as the values are clearly understood to generate what is required.
  • You need to understand how data within your data set falls within a particular cohort:
    • This use of values in different programs just needs to change, the underlying logic remains the same.
    • Using column names still means that to group them, the logic still needs to be written.

Python tutorial: Pandas groupby ( Video 1)

In this first video about pandas groupby and as part of expanding the data analytics information of this website, we are looking to explain how you can use a groupby selection to sort your data into similar datasets better so they can be better analysed. In the video below, we import our data into a dataframe, and then group as follows:

  • Directly naming the column
  • Through get_group
  • Using a loop
  • Utilising a lambda function

 

 

Python tutorial: Create an input box in Tkinter

Using an tkinter input box for your data projects

There may be an occasion as you are building out a data science or data analytics project, checks need to be performed on the dataset as follows:

  •  Big data sets and speed requirements in conjunction with
  • The need to reduce the volume of data returned which is impeding performance

and this is where input boxes and Tkinter can help!

In the below video, we are demonstrating an introduction to using an input box and validating the input.

We demonstrate how to validate the data entered into the tkinter input box and return a message, this will ensure the user gets the correct data.

Types of uses for a tkinter input box are varied, here are some thoughts:

  • Use an input box to return a set of data for a particular day.
  • Using them to filter down the results to a particular cohort of data.
  • Conduct a string search to find data quality issues to be fixed.

Python tutorial: How to create a graphical user interface in Tkinter

How would you like to present your data analytics work better?

When starting your data analytics projects, one of the critical considerations is how to present your results quickly and understandably?

Undoubtedly this is true if you are only going to look at the results yourself.

If the work you do is a repeatable process, a more robust longer-term solution needs to be applied, this is where Tkinter can help, which is a python graphical user interface.

There are many applications for using Tkinter, such as:

  • Use them to build calculators.
  • They can show graphs and bar charts.
  • Show graphics on a screen.
  • Validate user input.

Where this all fits in with data analytics?

While going through a set of data and getting some meaning to it can be challenging, using the python graphical user interface tutorial below can help build the screens that will allow a repeatable process to display in a meaningful way.

Ultimately, you could do the following:

  • Build a screen that shows data analytics errors in a data set, e.g. The number of blank column values in a dataset.
  • Another application is to run your analytics to show the results on a screen that can be printed or exported.
  • Similarly, you could also have a screen where a user selects several parameters that are fed into the data analytics code and produces information for the user to analyse.

There are many more ways that you could do this, but one of the most important things is that data analytics can be built into a windows environment using Tkinter that the user would be used to seeing. As a result, this could help to distribute a solution across an enterprise to lots of different users.

The only thing that needs to happen is that the requirements the user needs are defined, and the developer then builds on those, with the data analytics code run in the background of this program with Tkinter and output into a user-friendly screen for review.

How to create a combobox in tkinter

Estimated reading time: 2 minutes

Here we have delivered a complimentary video on How to create a graphical user interface in Tkinter .

It will demonstrate how a Combobox can be used to select values and then validate the entry chosen.

Using a Combobox in the computer programming world has been around for some time.

It is a useful way to select from a choice and could in many ways in data analytics help as the following examples show:

  • Select a date to filter a data set down to values that are in the dataset.
  • Using matplotlib to plot data points in charts, you could have dynamic values that change the diagram based on values chosen from the Combobox.
  • Utilizing data analytics reports that the user accesses, the Combobox could be used to change the data shown dynamically to allow comparisons.
  • When looking to fix data quality issues, use the Combobox to select values for a date that needs to be fixed, apply the fixes on screen, and then save back to the database.

Developing a Tkinter GUI and the possibilities it brings

In this video, we use ttk, written to help split the behavior of code from the code implementing its appearance.

You can see plenty more on it here ttk information. This is a handy piece of functionality as styling an object can interfere with how it works.

We also have a function that helps with the validation. In the below, it accomplishes the following:

  • Allows the combobox value selected to be retrieved.
  • Validates the entry chosen in the combobox using an if statement.
def checkifireland ():
    x = combolist.get() # asssigns the value inside the combobox to x so it can be processed
    if x == "Ireland":
        messagebox.showinfo("Correct answer", "You will love it in Ireland")
    else:
        messagebox.showinfo("Incorrect answer", "You should visit Ireland first!")

The effectiveness is especially handy as it helps to ensure that the code returned from the Combobox to the function is correct

The below video will take through this step by step and explain the concepts discussed above.

The next steps

There are many informative Python – working with excel videos that are on our YouTube channel.

We are looking to bring them in and show them on a graphical user interface tutorial.

To see how to load a set of values into a combobox from an SQLite database table, have a look at how to load values into a combobox

How to data cleanse a database table

In Data Analytics, that is a very relevant question, and something I look to implement in most projects, sometimes it is too easy to click the shortcut icon to your spreadsheet application!

Here we are looking to bring a little automation into these videos. Building on How to import data from files and Removing characters from an imported CSV file this video connects to a Microsoft Azure cloud database table, brings in the data with errors on it, fixes the errors and displays the correct output on the screen.

What can this do for organisations?

There are several benefits to automating this step:

  • Less manual intervention if there is a need to fix data issues.
  • Better productivity.
  • Better data flows with no errors and quicker reporting.

 

Moving away from files

The process of moving away from files and into automation has several steps:

  • Be clear on your data needs.
  • Understand what you are trying to achieve.
  • Build a process that is repeatable but can be updated easily.
  • Ensure that you build in data quality checks, helps deliver the better output to the users.

Thanks for stopping by!

Data Analytics Ireland

 

How to save data frame changes to a file

Estimated reading time: 2 minutes

Changing a file is the natural step; tracking those changes are just as important.
Change is part and parcel of life, but in the technology world with the complexity  and interdependency of systems, not effectively been able to track what goes on leads to:

  • Countless hours are trying to figure out where it went wrong.
  • You do not understand what needs fixing.
  • Systems/processes that ultimately work seamlessly, slow down unnecessarily.

In data analysis, as the volumes can be quite large, the human cannot feasibly review a set of data and find out where the underlying problem is. Well, they can, but it would take so long, nothing else would get done. This article by Forbes – predictions-about-data-in-2020-and-the-coming-decade predicts the consumption of data will just be getting bigger.

 

Let the script work, see the log of changes in the output.

How do you remove characters from an imported CSV file, looked at some data cleansing techniques, but there was no way of knowing what was changed other than a visual inspection. Here we introduce the data set into a data frame, change some of the values and show the output on the screen. But more importantly, as we progress through these steps, we are saving the changes as we go along.

The reality is that in large corporate settings, visual inspections would take up too much time and resources. An IT solution to help with giving the vital information required will reduce the data errors happening and allow for a more unobstructed view of how the companies data has changed over time.

 

Where does the trail lead to next?

  • Changes made to data needs a clear way of being able to be tracked.
  • How you captured those changes on your systems, needs to be addressed.
  • Implementing better systems will help you have confidence in your data changes.

Showing audit trail of changes

Python Tutorial: How to validate data using tuples

Do you want to validate with Tuples, that is easy, making changes not easy.

In our recent video Python – how do I remove unwanted characters lists were used as a lookup to validate data that we need to be check for invalid data items. The most apparent difference between the two is that tuples are immutable, hence changing their values is not possible, making using them in real-time code a bit hazardous.

So why would you use Tuples?

That is a good question and sometimes not too obvious when you try to put examples down on paper, but here are some cases:

  • You want a set of values that will never change, no matter what.
  •  Use as a lookup that the program can check against, these could be called anywhere in your code.
  •  Make sure that you only process what is in the tuple; any additional data can be reported as erroneous, a form of error control.

Getting around the change limitations (well kind of)

This video looks at a simple few steps to take in a set of data, validate the id column aginst a tuple set of values and then show the differences on a separate output.

The code is then rerun after we add the original tuple to the error values found, to give a new tuple. As a result, the new output will show up with no errors.

To sum it all up

In a nutshell, Tuples are limited in what they can do, probably the best thing for them is:

  • Use your code as a reference for re-occurring values that need to be validated.
  •  Don’t use in your code to have updated tuples, use lists instead as you can update them in real-time.

Python Tutorial: How to create charts in Excel

Estimated reading time: 2 minutes

Taking it further with a python excel chart.

We have created some video content here using Python with Excel that illustrates the different ways you can leverage Python to

  • Data cleanse
  • Find unwanted characters
  • And see if the file is empty before import!

What this video is giving as output

Here we are looking to introduce charts in Excel, and how how to use Python to easily work with your data and export to an excel sheet.

Below is the final output of our two charts, this is for illustrative purposes, and is taken from the Irish Governments website as at 1st May 2020, importing the cell ranges associated with them.

A barchart of covid cases bullt in Python a line chart of covid cases built in Python

How we went about it

In the below video on creating a python excel chart, we have approached this as follows:

 

  • Created four separate data frames, they are the four regions that will feed into the creation of the graphs.
  • Separate to this, we merged the four data frames into one, to use with the bar chart.

And to finish off

If you like what this video has explained, please click to see our YouTube channel for more informative videos.

Data Analytics Ireland

 

how to validate cell values in excel

Estimated reading time: 2 minutes

Validating cells in Excel quickly – how to do it easily!
Are you working with large spreadsheets and looking to quickly at data validation exercise to save you time?

The aim would be to run your code and test it against some predefined rules you or your data analyst would have written to make sure it brings back the expected checks.

If you look at the below, this is the final output of this video, highlighting two cells that are over budget based on the companies predefined budget.

data validation example

The structure of this code can be broken down into the following steps:

  • Read in the excel file, see a previous example here How to import data into excel
  •  Run the first function,  checks if the spreadsheet cell value is over or under budget.
  •  Run the second function that takes the value from the first function and applies the colour red to the cell if it is over budget.

 

Finally

You can expand this code to incorporate more functionality, such as:

  • Change the colour of the cells, to have multiple colours returned.
  •  Update the two functions to include more business rules.
  •  You could check if the file is empty before processing as shown here How to check if a file is empty

Please subscribe to our YouTube channel, the button is the right-hand side of the page if you would like to see more like these.

Data Analytics Ireland