How to remove unwanted characters

Estimated reading time: 2 minutes

Removing the unwanted, that is holding you up.
A situation has arisen of you having information, which has erroneous data inside it, what do you do?

Data issues are a common scenario faced by many data analytics professionals and the industry as a whole. Data quality now has become more critical, especially as we move more processes online and the digital landscape increases.

Most pieces of data go through a process of been transferred somewhere between systems to be used or reports rely on the accuracy of them. If the data in the source system has quality issues, the problem if not addressed before going somewhere else can then push the data quality issues more throughout an organisation, like a spiders web it expands further.

The next step, looking to fix the problem and planning for it.

To combat this data, professionals need to come up with a plan on how to tackle this, either:

  • Fix at source
  • Take the data in before moving it on, and investigate the problems.
  •  Reject the file or part thereof.

All three options above have scenarios around them with costs and implications, depending on the industry, you need to pick the most appropriate way to handle. As an example, in the banking industry payment files can sometimes have data in them that is rejected entirely or in part. But the bank may decide they will only discard the records with the wrong data and process everything else.

How to go about it and how regular expressions can help

In this video, we look to go through an example of how to cleanse a data set;

(A) We use a list to check what problems we need to find.

(B) Using functions again to process through the data to find the problem and extract them.

(C) Regular expressions also appear as they look to find the special characters in the data set.

The concept of regular expressions is used extensively across several programming languages; it is a good way to test data and find erroneous values. If you are thinking about machine learning, it is quite important to get a more thorough knowledge of how they work. Here is a good link for further reading if you need more information Regular Expression how to

Thanks for watching and if you like, please share and subscribe through the buttons on this page!

Data Analytics Ireland

hide a column from a data frame

Estimated reading time: 2 minutes

They say there is nowhere to hide, we disagree!
As an addition to How to add a column to a dataframe would you like to learn to go and hide it?! This video has several steps in it; following each one will give you a good introduction.

To start why you would like to hide a column?

  • You may not want to reveal its output as it is sensitive information.
  • The data in the column is not in the correct format, you will want to repurpose it, so it is the way you want it.
  •  The column could be a calculated column. Hence it serves as an intermediary step before your data frame is output.

Finding the best way to hide unwanted data:

In this video, we introduce several concepts to help not show a column:

  • Specify the actual columns you want to include in the data frame, by default doing this you are excluding the column or columns you don’t want to see.
  •  We use drop, to explicitly tell the data frame not to show a particular column.
  •  Also, we display a scenario whereby you have a calculated column but do not want to show its output, based on one of the reasons outlined above.
  • Finally, the index of the column can appear in the output, so we have shown through set_index how to hide it from what is displayed.

This latest in the Python Dataframe series looks to build on the knowledge in the previous examples. We hope as you learn python online, it will increase your programming skills.

Thanks for watching and don’t forget to like and share through our social media buttons to the right of this page.

Data Analytics Ireland