Estimated reading time: 2 minutes
Removing the unwanted, that is holding you up.
A situation has arisen of you having information, which has erroneous data inside it, what do you do?
Data issues are a common scenario faced by many data analytics professionals and the industry as a whole. Data quality now has become more critical, especially as we move more processes online and the digital landscape increases.
Most pieces of data go through a process of been transferred somewhere between systems to be used or reports rely on the accuracy of them. If the data in the source system has quality issues, the problem if not addressed before going somewhere else can then push the data quality issues more throughout an organisation, like a spiders web it expands further.
The next step, looking to fix the problem and planning for it.
To combat this problem professionals need to come up with a plan on how to tackle this, either:
- Fix at source
- Take the data in before moving it on, and investigate the problems.
- Reject the file or part thereof.
All three options above have scenarios around them with costs and implications, depending on the industry, you need to pick the most appropriate way to handle. As an example, in the banking industry payment files can sometimes have data in them that is rejected entirely or in part.
But the bank may decide they will only discard the records with the wrong data and process everything else.
How to go about it and how regular expressions can help
In this video, we look to go through an example of how to cleanse a data set;
(A) We use a list to check what problems we need to find.
(B) Using functions again to process through the data to find the problem and extract them.
(C) Regular expressions also appear as they look to find the special characters in the data set.
The concept of regular expressions is used extensively across several programming languages; it is a good way to test data and find erroneous values. If you are thinking about machine learning, it is quite important to get a more thorough knowledge of how they work. Here is a good link for further reading if you need more information Regular Expression how to
Thanks for watching and if you like, please share and subscribe through the buttons on this page!
Data Analytics Ireland