Are you on a limited budget but looking for free ways to extract data from files without using expensive online tools or companies that you will have to pay? Join us here for an overview of some tools and techniques that you most likely have access to already.
You are probably sitting there hearing about big data and databases, data analytics and machine learning and wonder where a data analyst fits in?
Here we will look to break it down step by step.
Sometimes a data analyst can be confused with a business analyst; there are subtle differences:
Business Analyst: Their role is to document the user’s requirements in a document that is descriptive of what the user wants.
In this case, a document that all parties can agree to is created, and it can be used as part of the project sign off.
Data Analyst: On the other hand, a data analyst will take the business requirements and translate them into data deliverables.
They use the document to ensure the project has the right data to meet the project objectives in the right place at the right time.
Data Mapping
In different data projects there will be a need to reconcile the data between systems, a data analysis will help here.
In a data mapping exercise, the data analyst will be expected to look at one or more sources and map them to a destination system.
This ensures a match between the two datasets.
Which results in the ability to reconcile the two systems.
Allows the ability to use data in multiple systems, knowing the consistency is in place.
Consistency of the data types between the systems.
It ensures that data validation errors are kept to a minimum.
Often a Data Analyst will build a traceability matrix, which tracks the data item from creation through to consumption.
Data Quality
In most companies, there will be teams (depending on their size) dedicated to this, and their input will be pivotal to existing and future data use.
It is an important task that could impact internal and external reporting and a company’s ability to make decisions accurately.
Some of the areas that might be looked at include:
(A) Investigate duplicate data – There could be a number of reasons this has to be checked:
Data manually entered multiple times.
An automated process ran multiple times.
A change to an IT system has unknowingly duplicated data.
(B) Finding errors – This could be completed in conjunction with data reporting outlined below.
Normally companies will clearly have rules that pick up the data errors that are not expected.
A data analyst will analyse why these errors are occurring.
(C) Checking for missing data.
Data feeds have failed. A request to reload the data will be required.
Data that was not requested as part of the business requirements confirm that this is the case.
(D) Enhancing the data with additional information – Is there additional information that can be added that can enrich the dataset?
(E) Checking data is in the correct format – There are scenarios where this can go wrong, and example is a date field is populated with text.
Data Reporting
In some of the areas above, we touched on the importance of the quality of data.
Ultimately there may be a need to track:
Data Quality – Build reports to capture the quality of data based on predefined business measurements.
Real-time Reporting – No new customers or customers who have left an organisation.
Track Targets – Is the target set by the business been met daily, weekly, monthly?
Management Reporting – Build reports that provide input to management packs that provide an overview of how the business performs.
Data Testing
Organisations go through change projects where new data is being introduced or enhanced.
As a result the data analyst will have a number of tasks to complete:
Write Test Scripts – Write all scripts for record counts, transformations and table to table comparisons.
Datatype Validation – Ensures all new data will be the same as the other data where it is stored.
No loss of data – Check all data is imported correctly with no data truncated.
Record count – Write an SQL script that would complete a source to the destination reconciliation.
Data Transformation – Ensure any transformations are applied correctly.
Supporting data projects
Ad hoc projects are common , and sometimes become a priority for businses as they deal with requirements that result as part of an immediate business need.
Data Analysts will be called upon to support projects where there is a need to ensure the data required is of a standard that meets the project deliverables:
Some common areas where this might occur includes:
Extract data where it has been found to have been corrupted.
Investigate data changes, to analyse where a data breach may have occurred.
An external regulatory body has requested information to back up some reports submitted.
A customer has requested all the company’s information on them; usually the case for a GDPR request.
When working with databases in your data analytics projects, most likely you will deal with more than one table. As a result in certain scenarios there will be a link between the tables
This will ensure that you can use data from the two or more tables that can be joined together. But how would you do that?
In order to join two tables, you will need columns in both tables that have values that are unique and of similar values.
In other words you need to create what is called aprimary key, its characteristics are as follows:
Its values are unique.
They cannot be null.
So where does a foreign key come in? Say we have two tables Table A and Table B.
Table “Customer Table” will have a column that has a primary key call Customer_No
Table “Sales Table” also has a column with a primary key called Sales_No.
Now we cant join the two tables, because the primary keys are different, and will have different values
But in the Sales Table, we will have an additional column called Customer_No, and it will hold some of the unique values that are held in the Customer Table, Customer_No column.
This Customer_No in the Sales Table is called the foreign key. It is unique, has null values and can be connected to a primary key column Customer_No on the Customer Table.
In this way tables can be connected where values in columns are not always the primary key of a table.
So let’s look at tables in SQLite and see how this works in practice.
Below you have a Customer Table and the Sales Table
In both tables, we have a primary key, though they are not the same primary key. As a result, you would not be able to join the tables as the primary keys contained in them do not have the same values, and this is how you would normally join.
In saying that, the two tables can be joined as the foreign key of Sales is related to the primary key of Customer. If invoice_no was not on Sales , then you could make the customer_no in sales the primary key.
So lets look at the tables below with data in them.
Normally when you try to join two tables, it is on the primary key. On the below invoice_no is the primary key of the table Sales, and customer_no is the primary key of the customer table.
While their values in each column are unique, they are not the same , so a join would fail as per the below:
But if you change on the join b.invoice_no TO b.customer_no, it will now bring back all the values asked for. The reason?
The primary key on one is linked to the foreign key on the other, with similar unique values.
As a result, you can still have a primary key column, not in another table and join to that table as long as there is a foreign key column with the same values.
This helps to maintain the table structure, without having to remove primary keys.
If you are working on data analytics , and have access to databases that you can load data into and manipulate, then read on.
There maybe a reason for you have your own partitioned table with data in it, but you need to know how to create a database.
The creation of a table can be achieved as follows;
Decide on the table name.
Within the table you will have columns, they need to be named.
Next for each column name a datatype needs to be decided upon.
It is important to note that the data type you decide is very important before you create the table.
This is because once you start loading data into the table if you decide to change the data type for a column, it could change the data contained within that column.
As a result a review of the data to be loaded should be completed, looking at the type of data, its length and any attributes of the data that need careful consideration.
Lets create our table and talk through the different aspects of the SQL used.
So in the below we have the create table statement
(C) Technology improvements have meant that storage and big data processing make the process of delivering services easier.
(D) Career opportunities for professionals with good skills have increased.
(E) A wide range of open source and paid tools are now easily available that help to process and report on the data.
(F) Entry requirements are easy, and this coupled with an ability to quickly acquire knowledge and skills helps entry whether you want to be full time or part-time.
(G)Knowledge and skills have improved as access to online learning has improved significantly.
As a result of all this:
(A) Large data volumes need to be analysed.
(B) Consumers’ habits about how they use a service or the information they look for now has a digital footprint.
(C) Now once a consumer has used a service ( whether purchased or not), the ability to understand their habits can be captured to deliver as follows:
The services they want.
The products they want.
Quicker turnaround time.
How can this help with all the data that is captured and stored?
So in this article, we have already outlined what the background is as to how the industry has evolved to where data analytics is now.
As outlined, all information traditionally would not have been stored in a format that was easily accessible.
Step 1 – Data Capture
To understand what you want to analyse, and help draw conclusions accurately, a data analyst will work with their technical colleagues to ensure that the correct data is captured.
Data capture of raw data, can happen in a number of ways:
User input.
Interaction with a website or application.
Consuming a service.
Requesting a service be completed.
Social media interaction.
In a lot of cases now this happening in real time.
Once the completeness and accuracy are fulfilled, your data quality will become less of an issue.
Step 2 – Analysing
(A) You create visual charts of it; this allows the viewer of the information to get an initial view of the information without looking at the underlying data. Sometimes this will show patterns in data or clusters or the types of data you capture.
(B) Using data science statistics to see if they can explain the data. This could show information such as how data is correlated or otherwise. Also, probabilities could be calculated to show what outcomes might happen in the future.
(C) Data analysts might also need to understand how to build a machine learning model to use complex algorithms, to explain the data better, sometimes patterns that are not immediately understood can be unearthed and investigated further.
An emphatic NO is an answer, but this is where some of the visualisation tools come in!
Visually presenting data points, very quickly allows a viewer of the information to come to a decision quickly, and the tools that are outlined below will help with that process.
These are a handful that will allow the data to be sliced and diced, there are many more out there, but they all allow data to be drilled down into and get to a real understanding what is going on.
Some of the tools include Tableau, Power BI, and Python( it has libraries that do a nice job)
Step 4 – Decision Making
So after all this analysing, there needs to be decisions made:
(A) Do you have the data in order and in the correct format?
(B) In a place that it can be accessed and reviewed.
(C) Relevant to when the decision needs to be made.
From the outset, as part of the work of performing the data analytics, an assessment needs to be made as to how often a decision will need to be made, with what data, and when.
At this point, the decision-makers should have a set of data ready for them to look over and reliably make a decision
Based on the data returned, if they can’t make a decision then possibly, steps 1-3 should be reviewed and revisited.
Often what happens, is what information that was required to make a change needs to be updated, or improved upon.
It is the job of the teams that manage the data sets to source that data and or change how they present it, to now reflect the decision that needs to be made.
Here in how to create an instance of a class, as described herein, how to create a class in Python, we will further explore the instance of class and how this can be used within a program to assign values to an object. This allows that object to inherit those values contained within the class, making it easier to have consistency regards functionality and data.
This video covers off
(a) creating an instance of a class
(B) Using the __init__ within the class
(C) define the constructor method __init__
(D) Creating an object that calls a class and uses the class to process some piece of data.
What are the benefits of this?
You only need to create one class that holds all the attributes required.
That class can be called from anywhere within a program, once an instance of it is created.
You can update the class, and once completed, those new values will become available to an instance of that class.
Makes for better management of objects and their properties, not multiple different versions contained within a program
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.