Regular expressions python

Estimated reading time: 3 minutes

Regular expressions explained

Regular expressions are a set of characters usually in a particular sequence that helps find a match/pattern for a specific piece of data in a dataset.

The purpose is to allow a uniform of set characters that can be reused multiple times, based on the requirements of the user, without having to build each time.

The patterns are similar to those that you would find in Perl.

How are regular expressions built?

To start, in regular expressions, there are metacharacters, which are characters that have a special meaning. Their values are as follows:

. ^ $ * + ? { } [ ] \ | ( )

.e = All occurrences which have one “e”, and value before that e. There can be multiple e, eg ..e means check two characters before e.

^ =Check if a string starts with a particular pattern.

*  = Match zero or more occurrences of a pattern, at least one of the characters can be found.

+ = Looks to match exact patterns, one or more times, and if they are not precisely equal, then nothing is returned.

? =Check if a string after ? exists in a pattern and returns it. If a value before the ? is directly beside the value after ? then returns both values.

—> e.g. t?e is the search pattern. “The” is the string. The result will return only the value e, but if the string is “te”, then it will return te, as the letters are directly beside each other.

da{2} = Check to see if a character has a set of other characters following it. E.g. sees if d has two “a” following it.

[abc] = These are the characters you are looking for in the data. Could also use [a-c] and will give you the same result. Change to uppercase to get only those with uppercase.

\ = Denoting a backslash used to escape all metacharacters, so if they need to be found in a string, they can be. Used to escape $ in a string so they can be found as a literal value.

| = This is used when you want an “or” operator in the logic, i.e. check for one or more values from a pattern, either or both can be present.

() = Looks to group pattern searches or a partial match, to see if they are together or not.

 

Special sequences, making it easier again

\a = Matches if the specified characters are at the start of the string been searched.

\b = Matches if the specified characters are at the beginning or the end of the string been searched.

\B = Matches if the specified characters are NOT at the beginning or the end of the string been searched.

\d = Matches any digits 0-9.

\D = Matches any character is not a digit.

\s = Matches where a string contains a whitespace character.

\S = Matches where a string contains a non-whitespace character.

\w = Matches if digits or character or _ found

\W = Matches if non-digits and or characters or _found

\z = matches if the specified characters are at the end of the string.

 

 

For further references and reading materials, please see the below websites, the last one is really useful in testing any regular expressions you would like to build:

See further reading material here: regular expression RE explained

Another complementary page to the link above regular expression REGEX explained

I found this link on the internet, and would thoroughly recommend you bookmark it. It will also allow you to play around with regular expressions and test them before you put into your code, a very recommended resource Testing regular expressions

 

What are the reserved keywords in Python

What are python reserved keywords?

When coding in the Python language there are particular python reserved words that the system uses, which cannot be accessed as a variable or a function as the computer program uses them to perform specific tasks.

When you try to use them, the system will block it and throws out an error. Running the below code in Python

import keyword
keywordlist = keyword.kwlist
print(keywordlist)

Produces the below keyword values
['False', 'None', 'True', 'and', 'as', 'assert', 'async', 'await', 'break', 'class', 'continue', 'def', 'del',
'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal',
'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield']

When writing your code, it is important to follow the following guidelines:

(A) Research the keywords first for the language you are writing in.

(B) Ensure that your programming language highlights keywords when used, so you can fix the issue.

(C) Setup your computer program in debug mode to highlight keywords use.

With some programs running into thousands of lines of code, with additional functions and variables, it can become harder to spot the problem, so good rigour in the initial stages of coding will help down the road any issues that you may find that need to fixed.

This code was run in Python version 3.8

Python tutorial: Create an input box in Tkinter

Using an tkinter input box for your data projects

There may be an occasion as you are building out a data science or data analytics project, checks need to be performed on the dataset as follows:

  •  Big data sets and speed requirements in conjunction with
  • The need to reduce the volume of data returned which is impeding performance

and this is where input boxes and Tkinter can help!

In the below video, we are demonstrating an introduction to using an input box and validating the input.

We demonstrate how to validate the data entered into the tkinter input box and return a message, this will ensure the user gets the correct data.

Types of uses for a tkinter input box are varied, here are some thoughts:

  • Use an input box to return a set of data for a particular day.
  • Using them to filter down the results to a particular cohort of data.
  • Conduct a string search to find data quality issues to be fixed.

YouTube channel lists – Tuples

Estimated reading time: 1 minute

How to use python Tuples is very similar to Python Lists on our YouTube channel. Like lists tuples can store data. The significant difference is that you cannot change tuples at all; they are “Immutable”. The importance of this when using Tuples is it could impact how you structure your code, especially if you are trying to pass values within it.

Some things to consider in order to understand regards tuples:

(A)When you are using them are enclosed within (), lists have []

(B) There is no ability to change them.

(C) As they cannot be changed and can be considered static data holders, you could use them as a lookup. An example could be something that can only have a set amount of values associated with it.

Below is our list for Tuples from our YouTube channel, if you like what we do, please subscribe! Regular updates to all our lists shall be happening, so subscribing an excellent way to keep in touch 🙂

Thanks!

Data Analytics Ireland

how to pass data between functions

Estimated reading time: 2 minutes

In this python program, we are learning how to pass data between functions. 

In light of this, you will use and see functions in many programming languages and data analytics projects.

As a result, the ability to understand them has become important.

Functions serve a number of benefits:

  • You can pass a number of arguments to them to be processed.
  • It reduces repetition as the function can be called from many places with a program.
  • They are easily identified by using the def keyword in your code.
  • A return statement can give you the output of the function to show on the screen or pass to another function.

It compliments r-tutorial-how-to-pass-data-between-functions/  as a result this is a handy bit of functionality used widely across many different programming languages.

Below is a video that will help to give an understanding of how to pass data between functions when trying to learn python:

 

In many of the Data Analytics Ireland    YouTube channel videos, there is an emphasis on creating content that eliminates duplication of code within the code.

We have also started incorporating classes as well and you can see here How to create a class in Python, a tutorial on how to create one.

Classes by their nature have methods, which are called on the objects that created them ( the class), and can alter their state, whereas a function will run and just return a value.

It is important to understand the distinction as while the two will most likely achieve the same outcome, it is the ability to change the class state that will differentiate the two.

YouTube channel lists – Python Lists

Estimated reading time: 2 minutes

Python lists are used extensively in projects, as a result it is important to understand their structure.

Some of the things they can be used for:

  1. Lookup values for comparisons.
  2. Passing data to them to store to be referenced elsewhere.
  3. As part of a loop, store values that have been found through the loop logic.

With methods are associated with lists?

  1. Append – Add values to the end of the list
  2. Extend – adds values from an iterable object to the end of the list.
  3. Insert – You can insert an item to a certain position in a list.
  4. Remove – Remove the first value in a list that has a value that was asked to be looked for.
  5. Pop – This also removes a value at a certain position and returns, consequently if no position is specified then it removes the last item and returns it.
  6. Clear – Removes all items from the list.
  7. Index – returns the index value of the first item found that was asked to be searched for.
  8. Count – returns the number of times an item that was searched or was found in a list.
  9. Sort – Sorts the items in the list.
  10. Reverse – This reverses the items in the list.
  11. Copy – This makes a copy of the list.

What are the properties of a list?

The data type has the following attributes, that make it really useful for a vast array of scenarios:

  • They are ordered – Whatever order the list is a unique characteristic of the list, furthermore changing the order makes it a different list.
  • You can use their index to access the value.
  • They are mutable, meaning you can apply any of the above methods on them.
  • They can contain strings, integers etc, accordingly, there is no restriction on what can be in the list.

Check out the below video playlist from our YouTube channel, they will help explain more about lists:

On this website you can also read about how to compare two lists in Python or how to sort lists using rstudio in addition to this blog post.

We hope you enjoy it!

Data Analytics Ireland

YouTube channel lists – Python DataFrames

Estimated reading time: 1 minute

Welcome to this new blogging website! We are all about data analytics to have a look at this page here About Data Analytics Ireland

To keep it simple we have created some lists here and on our YouTube Channel

As we progress over the next while, the website will be updated as we go along, and while there may be a  lot of video content, we will look to mix it up with different formats.

We have started with Python Data frames :

We hope you enjoy and don’t forget if you like what we are doing subscribe to our channel!

Data Analytics Ireland