What does a data analyst do?

Estimated reading time: 4 minutes

Livestream #2 – What does a data analyst do?

You are probably sitting there hearing about big data and databases, data analytics and machine learning and wonder where a data analyst fits in?

Here we will look to break it down step by step.

Sometimes a data analyst can be confused with a business analyst; there are subtle differences:

  • Business Analyst: Their role is to document the user’s requirements in a document that is descriptive of what the user wants.
    • In this case, a document that all parties can agree to is created, and it can be used as part of the project sign off.
  • Data Analyst: On the other hand, a data analyst will take the business requirements and translate them into data deliverables.
    • They use the document to ensure the project has the right data to meet the project objectives in the right place at the right time.

Data Mapping

In different data projects there will be a need to reconcile the data between systems, a data analysis will help here.

In a data mapping exercise, the data analyst will be expected to look at one or more sources and map them to a destination system.

  • This ensures a match between the two datasets.
  • Which results in the ability to reconcile the two systems.
  • Allows the ability to use data in multiple systems, knowing the consistency is in place.
  • Consistency of the data types between the systems.
  • It ensures that data validation errors are kept to a minimum.

Often a Data Analyst will build a traceability matrix, which tracks the data item from creation through to consumption.

Data Quality

In most companies, there will be teams (depending on their size) dedicated to this, and their input will be pivotal to existing and future data use.

It is an important task that could impact internal and external reporting and a company’s ability to make decisions accurately.

Some of the areas that might be looked at include:

(A) Investigate duplicate data – There could be a number of reasons this has to be checked:

  • Data manually entered multiple times.
  • An automated process ran multiple times.
  • A change to an IT system has unknowingly duplicated data.

(B) Finding errors – This could be completed in conjunction with data reporting outlined below.

  • Normally companies will clearly have rules that pick up the data errors that are not expected.
  • A data analyst will analyse why these errors are occurring.

(C) Checking for missing data.

  • Data feeds have failed. A request to reload the data will be required.
  • Data that was not requested as part of the business requirements confirm that this is the case.

(D) Enhancing the data with additional information – Is there additional information that can be added that can enrich the dataset?

(E) Checking data is in the correct format – There are scenarios where this can go wrong, and example is a date field is populated with text.

Data Reporting

In some of the areas above, we touched on the importance of the quality of data.

Ultimately there may be a need to track:

  • Data Quality – Build reports to capture the quality of data based on predefined business measurements.
  • Real-time Reporting – No new customers or customers who have left an organisation.
  • Track Targets – Is the target set by the business been met daily, weekly, monthly?
  • Management Reporting – Build reports that provide input to management packs that provide an overview of how the business performs.

Data Testing

Organisations go through change projects where new data is being introduced or enhanced.

As a result the data analyst will have a number of tasks to complete:

  • Write Test Scripts – Write all scripts for record counts, transformations and table to table comparisons.
  • Datatype Validation – Ensures all new data will be the same as the other data where it is stored.
  • No loss of data – Check all data is imported correctly with no data truncated.
  • Record count – Write an SQL script that would complete a source to the destination reconciliation.
  • Data Transformation – Ensure any transformations are applied correctly.

Supporting data projects

Ad hoc projects are common , and sometimes become a priority for businses as they deal with requirements that result as part of an immediate business need.

Data Analysts will be called upon to support projects where there is a need to ensure the data required is of a standard that meets the project deliverables:

Some common areas where this might occur includes:

  • Extract data where it has been found to have been corrupted.
  • Investigate data changes, to analyse where a data breach may have occurred.
  • An external regulatory body has requested information to back up some reports submitted.
  • A customer has requested all the company’s information on them; usually the case for a GDPR request.

how to remove unwanted characters from your data

Using r programming functions to cleanse data
In our recent post  Python Tutorial: How do I remove unwanted characters  we walked through the concepts behind data cleansing.

We demonstrated several different approaches to data cleansing, and the use of regular expressions was also shown.

Here we look at that approach using rstudio and the following functions:

How to approach data cleansing

In removing unwanted characters, you want to ensure that you have a defined list of what should not appear and will cause you errors. It is also essential to understand the type of data errors that can occur as follows:

  • Data entry errors
  • Data columns have the incorrect format, e.g. Telphone numbers which have non-numerical characters in them
  • Missing data that is required – e.g. null values
  • Data that does not make sense, e.g. date of birth that is beyond the range of what you would typically expect to see
  • Duplicate values for the same piece of data. The problem here is that this can inflate the no of data errors, and not give a true count of the actual errors.

In the below video we utilise r programming code using the above functions, but also use an if statement to check if an unwanted character is in the data set first before proceeding to remove it and return the cleansed data.

Some of the strategies to help counteract data errors could include:

  • Eliminate manual inputs
  • Controls at point of entry for data, e.g. for dates only allow date formats in the field.
  • Reduce duplication of the data across multiple systems, reduces the no of places that data differences can occur.
  • If integrating different systems with the same data into one network, perform a data cleanse beforehand, reduces the work needed afterwards to clean up the problems that brings.

 

how to validate cell values in excel

Estimated reading time: 2 minutes

Validating cells in Excel quickly – how to do it easily!
Are you working with large spreadsheets and looking to quickly at data validation exercise to save you time?

The aim would be to run your code and test it against some predefined rules you or your data analyst would have written to make sure it brings back the expected checks.

If you look at the below, this is the final output of this video, highlighting two cells that are over budget based on the companies predefined budget.

data validation example

The structure of this code can be broken down into the following steps:

  • Read in the excel file, see a previous example here How to import data into excel
  •  Run the first function,  checks if the spreadsheet cell value is over or under budget.
  •  Run the second function that takes the value from the first function and applies the colour red to the cell if it is over budget.

 

Finally

You can expand this code to incorporate more functionality, such as:

  • Change the colour of the cells, to have multiple colours returned.
  •  Update the two functions to include more business rules.
  •  You could check if the file is empty before processing as shown here How to check if a file is empty

Please subscribe to our YouTube channel, the button is the right-hand side of the page if you would like to see more like these.

Data Analytics Ireland


Notice: ob_end_flush(): failed to send buffer of zlib output compression (0) in /home/dataana1/public_html/wp-includes/functions.php on line 4979