Skip to content
  • YouTube
  • FaceBook
  • Twitter
  • Instagram

Data Analytics Ireland

Data Analytics and Video Tutorials

  • Home
  • Contact
  • About Us
    • Latest
    • Write for us
    • Learn more information about our website
  • Useful Links
  • Glossary
  • All Categories
  • Faq
  • Livestream
  • Toggle search form
  • Import a CSV file with an SQL query CSV
  • TypeError: List Indices Must Be Integers Or Slices, Not Tuple exceptions
  • What is Visual Studio Code? Visual Studio
  • How to Create Multiple XML Files From Excel With Python Python
  • How to check if a file is empty Python
  • Python tutorial: How to create a graphical user interface in Tkinter Python
  • YouTube channel lists – Tuples Python Tuples
  • How To Check For Unwanted Characters Using Loops With Python Python Data Cleansing

how to remove unwanted characters from your data

Posted on June 2, 2020December 29, 2020 By admin

Using r programming functions to cleanse data
In our recent post  Python Tutorial: How do I remove unwanted characters  we walked through the concepts behind data cleansing.

We demonstrated several different approaches to data cleansing, and the use of regular expressions was also shown.

Here we look at that approach using rstudio and the following functions:

  • stringr explained
  • stringi explained
  • grepl explained

How to approach data cleansing

In removing unwanted characters, you want to ensure that you have a defined list of what should not appear and will cause you errors. It is also essential to understand the type of data errors that can occur as follows:

  • Data entry errors
  • Data columns have the incorrect format, e.g. Telphone numbers which have non-numerical characters in them
  • Missing data that is required – e.g. null values
  • Data that does not make sense, e.g. date of birth that is beyond the range of what you would typically expect to see
  • Duplicate values for the same piece of data. The problem here is that this can inflate the no of data errors, and not give a true count of the actual errors.

In the below video we utilise r programming code using the above functions, but also use an if statement to check if an unwanted character is in the data set first before proceeding to remove it and return the cleansed data.

Some of the strategies to help counteract data errors could include:

  • Eliminate manual inputs
  • Controls at point of entry for data, e.g. for dates only allow date formats in the field.
  • Reduce duplication of the data across multiple systems, reduces the no of places that data differences can occur.
  • If integrating different systems with the same data into one network, perform a data cleanse beforehand, reduces the work needed afterwards to clean up the problems that brings.

 

R Programming Tags:Cleansed data, Data, Data Analysis, Data Analytics, Data Cleansing, Data Science, Data Validation, R programming language, Read data, rstudio

Post navigation

Previous Post: Tkinter python tutorial
Next Post: How to create a combobox in tkinter

Related Posts

  • R – How to check a file exists and is not empty R Programming
  • R Tutorial: How to pass data between functions R Programming
  • R tutorial – How to sort lists using rstudio R Programming
  • R – How to open a file R Programming
  • What is the r programming language R Programming

Select your language!

  • हिंदी
  • Español
  • Português
  • Français
  • Italiano
  • Deutsch
  • select rows with a certain value using SQL SQL
  • How to data cleanse a database table Python Data Cleansing
  • How to sort a Python Dictionary Python
  • What is GITHUB, and should I use it? github
  • TypeError: ‘int’ object is not callable Python
  • TypeError: the first argument must be callable Python
  • how to select all records with SQL SQL
  • Python Tutorial: How to validate data using tuples Python Tuples

Copyright © 2023 Data Analytics Ireland.

Powered by PressBook Premium theme

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT