Tkinter is an application within Python that allows users to create GUI or graphical user interfaces to manage data in a more user-friendly way.
We have tested this code over 100,000 records sitting on the Microsoft OneDrive network so in a way, its speeds were quite good, over five tests, they all were under 100s from start to finish.
In this Tkinter GUI tutorial python, you will be shown how to find the data errors, clean them and then export the final result to excel.
We will take you through the following:
Creation of the Tkinter interface.
Methods/ functions to find errors.
Methods/functions to clean the data.
Exporting the clean data to an excel file.
The video walks through the creation of a Tkinter window using a canvas and a frame to store the data frame.
Then it looks at importing the data through pd.read_excel, to load the data into a pandas data frame.
Next, there is a function and or method that will extract the errors through str.extract , which is loaded into separate columns
Finally, I have exported the clean dataset using rawdata.to_excel , and saved the file as a separate new spreadsheet.
Here in how to create an instance of a class, as described herein, how to create a class in Python, we will further explore the instance of class and how this can be used within a program to assign values to an object. This allows that object to inherit those values contained within the class, making it easier to have consistency regards functionality and data.
This video covers off
(a) creating an instance of a class
(B) Using the __init__ within the class
(C) define the constructor method __init__
(D) Creating an object that calls a class and uses the class to process some piece of data.
What are the benefits of this?
You only need to create one class that holds all the attributes required.
That class can be called from anywhere within a program, once an instance of it is created.
You can update the class, and once completed, those new values will become available to an instance of that class.
Makes for better management of objects and their properties, not multiple different versions contained within a program
In this first video about pandas groupby and as part of expanding the data analytics information of this website, we are looking to explain how you can use a groupby selection to sort your data into similar datasets better so they can be better analysed. In the video below, we import our data into a dataframe, and then group as follows:
Regular expressions are a set of characters usually in a particular sequence that helps find a match/pattern for a specific piece of data in a dataset.
The purpose is to allow a uniform of set characters that can be reused multiple times, based on the requirements of the user, without having to build each time.
The patterns are similar to those that you would find in Perl.
How are regular expressions built?
To start, in regular expressions, there are metacharacters, which are characters that have a special meaning. Their values are as follows:
. ^ $ * + ? { } [ ] \ | ( )
.e = All occurrences which have one “e”, and value before that e. There can be multiple e, eg ..e means check two characters before e.
^ =Check if a string starts with a particular pattern.
* = Match zero or more occurrences of a pattern, at least one of the characters can be found.
+ = Looks to match exact patterns, one or more times, and if they are not precisely equal, then nothing is returned.
? =Check if a string after ? exists in a pattern and returns it. If a value before the ? is directly beside the value after ? then returns both values.
—> e.g. t?e is the search pattern. “The” is the string. The result will return only the value e, but if the string is “te”, then it will return te, as the letters are directly beside each other.
da{2} = Check to see if a character has a set of other characters following it. E.g. sees if d has two “a” following it.
[abc] = These are the characters you are looking for in the data. Could also use [a-c] and will give you the same result. Change to uppercase to get only those with uppercase.
\ = Denoting a backslash used to escape all metacharacters, so if they need to be found in a string, they can be. Used to escape $ in a string so they can be found as a literal value.
| = This is used when you want an “or” operator in the logic, i.e. check for one or more values from a pattern, either or both can be present.
() = Looks to group pattern searches or a partial match, to see if they are together or not.
Special sequences, making it easier again
\a = Matches if the specified characters are at the start of the string been searched.
\b = Matches if the specified characters are at the beginning or the end of the string been searched.
\B = Matches if the specified characters are NOT at the beginning or the end of the string been searched.
\d = Matches any digits 0-9.
\D = Matches any character is not a digit.
\s = Matches where a string contains a whitespace character.
\S = Matches where a string contains a non-whitespace character.
\w = Matches if digits or character or _ found
\W = Matches if non-digits and or characters or _found
\z = matches if the specified characters are at the end of the string.
For further references and reading materials, please see the below websites, the last one is really useful in testing any regular expressions you would like to build:
I found this link on the internet, and would thoroughly recommend you bookmark it. It will also allow you to play around with regular expressions and test them before you put into your code, a very recommended resource Testing regular expressions
When coding in the Python language there are particular python reserved words that the system uses, which cannot be accessed as a variable or a function as the computer program uses them to perform specific tasks.
When you try to use them, the system will block it and throws out an error. Running the below code in Python
When writing your code, it is important to follow the following guidelines:
(A) Research the keywords first for the language you are writing in.
(B) Ensure that your programming language highlights keywords when used, so you can fix the issue.
(C) Setup your computer program in debug mode to highlight keywords use.
With some programs running into thousands of lines of code, with additional functions and variables, it can become harder to spot the problem, so good rigour in the initial stages of coding will help down the road any issues that you may find that need to fixed.
How would you like to present your data analytics work better?
When starting your data analytics projects, one of the critical considerations is how to present your results quickly and understandably?
Undoubtedly this is true if you are only going to look at the results yourself.
If the work you do is a repeatable process, a more robust longer-term solution needs to be applied, this is where Tkinter can help, which is a python graphical user interface.
There are many applications for using Tkinter, such as:
Use them to build calculators.
They can show graphs and bar charts.
Show graphics on a screen.
Validate user input.
Where this all fits in with data analytics?
While going through a set of data and getting some meaning to it can be challenging, using the python graphical user interface tutorial below can help build the screens that will allow a repeatable process to display in a meaningful way.
Ultimately, you could do the following:
Build a screen that shows data analytics errors in a data set, e.g. The number of blank column values in a dataset.
Another application is to run your analytics to show the results on a screen that can be printed or exported.
Similarly, you could also have a screen where a user selects several parameters that are fed into the data analytics code and produces information for the user to analyse.
There are many more ways that you could do this, but one of the most important things is that data analytics can be built into a windows environment using Tkinter that the user would be used to seeing. As a result, this could help to distribute a solution across an enterprise to lots of different users.
The only thing that needs to happen is that the requirements the user needs are defined, and the developer then builds on those, with the data analytics code run in the background of this program with Tkinter and output into a user-friendly screen for review.
It will demonstrate how a Combobox can be used to select values and then validate the entry chosen.
Using a Combobox in the computer programming world has been around for some time.
It is a useful way to select from a choice and could in many ways in data analytics help as the following examples show:
Select a date to filter a data set down to values that are in the dataset.
Using matplotlib to plot data points in charts, you could have dynamic values that change the diagram based on values chosen from the Combobox.
Utilizing data analytics reports that the user accesses, the Combobox could be used to change the data shown dynamically to allow comparisons.
When looking to fix data quality issues, use the Combobox to select values for a date that needs to be fixed, apply the fixes on screen, and then save back to the database.
Developing a Tkinter GUI and the possibilities it brings
In this video, we use ttk, written to help split the behavior of code from the code implementing its appearance.
You can see plenty more on it here ttk information. This is a handy piece of functionality as styling an object can interfere with how it works.
We also have a function that helps with the validation. In the below, it accomplishes the following:
Allows the combobox value selected to be retrieved.
Validates the entry chosen in the combobox using an if statement.
def checkifireland ():
x = combolist.get() # asssigns the value inside the combobox to x so it can be processed
if x == "Ireland":
messagebox.showinfo("Correct answer", "You will love it in Ireland")
else:
messagebox.showinfo("Incorrect answer", "You should visit Ireland first!")
The effectiveness is especially handy as it helps to ensure that the code returned from the Combobox to the function is correct
The below video will take through this step by step and explain the concepts discussed above.
In removing unwanted characters, you want to ensure that you have a defined list of what should not appear and will cause you errors. It is also essential to understand the type of data errors that can occur as follows:
Data entry errors
Data columns have the incorrect format, e.g. Telphone numbers which have non-numerical characters in them
Missing data that is required – e.g. null values
Data that does not make sense, e.g. date of birth that is beyond the range of what you would typically expect to see
Duplicate values for the same piece of data. The problem here is that this can inflate the no of data errors, and not give a true count of the actual errors.
In the below video we utilise r programming code using the above functions, but also use an if statement to check if an unwanted character is in the data set first before proceeding to remove it and return the cleansed data.
Some of the strategies to help counteract data errors could include:
Eliminate manual inputs
Controls at point of entry for data, e.g. for dates only allow date formats in the field.
Reduce duplication of the data across multiple systems, reduces the no of places that data differences can occur.
If integrating different systems with the same data into one network, perform a data cleanse beforehand, reduces the work needed afterwards to clean up the problems that brings.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.