What does a data analyst do?

Estimated reading time: 4 minutes

Livestream #2 – What does a data analyst do?

You are probably sitting there hearing about big data and databases, data analytics and machine learning and wonder where a data analyst fits in?

Here we will look to break it down step by step.

Sometimes a data analyst can be confused with a business analyst; there are subtle differences:

  • Business Analyst: Their role is to document the user’s requirements in a document that is descriptive of what the user wants.
    • In this case, a document that all parties can agree to is created, and it can be used as part of the project sign off.
  • Data Analyst: On the other hand, a data analyst will take the business requirements and translate them into data deliverables.
    • They use the document to ensure the project has the right data to meet the project objectives in the right place at the right time.

Data Mapping

In different data projects there will be a need to reconcile the data between systems, a data analysis will help here.

In a data mapping exercise, the data analyst will be expected to look at one or more sources and map them to a destination system.

  • This ensures a match between the two datasets.
  • Which results in the ability to reconcile the two systems.
  • Allows the ability to use data in multiple systems, knowing the consistency is in place.
  • Consistency of the data types between the systems.
  • It ensures that data validation errors are kept to a minimum.

Often a Data Analyst will build a traceability matrix, which tracks the data item from creation through to consumption.

Data Quality

In most companies, there will be teams (depending on their size) dedicated to this, and their input will be pivotal to existing and future data use.

It is an important task that could impact internal and external reporting and a company’s ability to make decisions accurately.

Some of the areas that might be looked at include:

(A) Investigate duplicate data – There could be a number of reasons this has to be checked:

  • Data manually entered multiple times.
  • An automated process ran multiple times.
  • A change to an IT system has unknowingly duplicated data.

(B) Finding errors – This could be completed in conjunction with data reporting outlined below.

  • Normally companies will clearly have rules that pick up the data errors that are not expected.
  • A data analyst will analyse why these errors are occurring.

(C) Checking for missing data.

  • Data feeds have failed. A request to reload the data will be required.
  • Data that was not requested as part of the business requirements confirm that this is the case.

(D) Enhancing the data with additional information – Is there additional information that can be added that can enrich the dataset?

(E) Checking data is in the correct format – There are scenarios where this can go wrong, and example is a date field is populated with text.

Data Reporting

In some of the areas above, we touched on the importance of the quality of data.

Ultimately there may be a need to track:

  • Data Quality – Build reports to capture the quality of data based on predefined business measurements.
  • Real-time Reporting – No new customers or customers who have left an organisation.
  • Track Targets – Is the target set by the business been met daily, weekly, monthly?
  • Management Reporting – Build reports that provide input to management packs that provide an overview of how the business performs.

Data Testing

Organisations go through change projects where new data is being introduced or enhanced.

As a result the data analyst will have a number of tasks to complete:

  • Write Test Scripts – Write all scripts for record counts, transformations and table to table comparisons.
  • Datatype Validation – Ensures all new data will be the same as the other data where it is stored.
  • No loss of data – Check all data is imported correctly with no data truncated.
  • Record count – Write an SQL script that would complete a source to the destination reconciliation.
  • Data Transformation – Ensure any transformations are applied correctly.

Supporting data projects

Ad hoc projects are common , and sometimes become a priority for businses as they deal with requirements that result as part of an immediate business need.

Data Analysts will be called upon to support projects where there is a need to ensure the data required is of a standard that meets the project deliverables:

Some common areas where this might occur includes:

  • Extract data where it has been found to have been corrupted.
  • Investigate data changes, to analyse where a data breach may have occurred.
  • An external regulatory body has requested information to back up some reports submitted.
  • A customer has requested all the company’s information on them; usually the case for a GDPR request.

Tableau Desktop versus Tableau Server

Estimated reading time: 5 minutes

Are you working on a data analytics project, and seeking a way to present your data visually?

Data Visualisation achieves this, and there are many products on the market place that you could use.

In this blog post, we are going to discuss Tableau , one of the leading data visualisation tools in the market place.

So before we dive into this tool and start looking at it, we need to ask the question, why pursue data visualisation?

In the pursuit of a better understanding of data a company holds, a data analyst will need access to multiple tables and records.

They first of all will need to get the right data, usually through SQL select statements.

The challenge then is how do they take all this data and present it in a meaningful way? Data visualisation looks to fix this problem by:

  • Aggregating data into meaningful groups.
  • Removing the need to trawl through rows and rows of data.
  • Allow the ability to drill down deeper, to see what makes up a set of data.
  • Create visually appealing pages, that quickly give the viewer an understanding of what is going on with the data, and spot visual patterns.

So what is Tableau Desktop?

Tableau Desktop is used by end users with the following functionality in mind:

  • Interactive dashboards
  • The ability to connect to data on-premises or in the cloud.
  • Exceptional analytics demand more than a pretty dashboard.
  • Quickly build powerful functionality:
    • That allows calculations from existing data.
    • Enables drag and drop of data.
    • Provides statistical output.
  • Make your point with trend analyses, regressions and correlations for tried-and-true statistical understanding.
  • It also allows you to:
    • Ask new questions, look at the data from a way you had not thought of before.
    • Spot trends – See visually how data is moving in a direction, can you benefit from this insight?
    • Identify opportunities and make data-driven decisions with confidence.

As a result you can share/visualise the underlying data securely using Tableau Server.

So what is Tableau Server?

Tableau Server on the other hand is used as follows:

  • It is an enterprise solution, you can let the whole organisation leverage the power of its functionality.
  • In light of this, it empowers your business with the freedom to explore data in a trusted environment, and it doesn’t limit them to pre-defined questions, wizards or chart types.
  • For the purpose of understanding your data better, it has the functionally for you to ask questions, and these use sophisticated algorithms.
  • Also, it has artificial intelligence capabilities that allow the software to find insights you may not have been aware of.

You can also harvest the following capabilities:

  • Connect to Cloudera Hadoop, Oracle, AWS Redshift, cubes, Teradata, Microsoft SQL Server, for your enterprise needs.
  • Similarly, it has Governance capabilities so that you can centrally manage all of your metadata and security rules.
  • The Tableau platform is easy to deploy, scale and monitor.

If security is something that is important:

  • Whether you use Active Directory, Kerberos, OAuth or another standard, Tableau seamlessly integrates with your existing security protocols.
  • Easily track and manage content, users, licences and performance.
  • Quickly manage permissions for data sources and content and monitor usage visually.
    • Tableau Data Management helps you better manage the data within your analytics environment, ensuring that trusted and up-to-date data is always used to drive decisions.

So you have seen both, what areas should you look at before deciding on which to use?

CriteriaTableau ServerTableau Desktop
LicensingCentrally managed, licenses can be easily redistributed if a person leaves the organisation.Managed on a case by case basis, if a person leaves the license needs to be transferred, can cause additional administration.
SecurityPermissions are managed centrally, on access and ability to update the dashboard. It can leverage the power of Active Directory, Kerberos, OAuth.Permissions managed locally, based on the corporate network setup, which is usually a username and password.
ConnectivityWorks well with popular enterprise data sources like Cloudera Hadoop, Oracle, AWS Redshift, cubes, Teradata, Microsoft SQL Server.You can connect to data on-premises or in the cloud – whether it’s big data, a SQL database, a spreadsheet or cloud apps like Google Analytics and Salesforce.
Artificial IntelligenceAs this is usually installed on the corporate network the ability to perform complex calculations has more power on dedicated servers.No capability on the desktop version
ScalabilityHighly scalable allows for more users and data to be added, becomes easier to manage.As stored locally, you are subject to the capabilities and size of the computer you are on. The larger the volumes, the longer the processing takes, and you could run out of space locally.
DevelopmentIt would depend on the user’s rights, not might be possible/ideal if the server is just used for hosting and managing the dashboards.It can be achieved with the desktop but then sharing it has its limitations per Sharing below. On the other hand, when development is complete it could be loaded to the server for sharing with other users.
Data ManagementYou can manage all your data centrally, thus if you have multiple dashboards, they will work off the one set of data ensuring consistency.If you have a number of people developing dashboards that are similar, they need to ensure where they pull their data from is consistent, so that the outputs do not give conflicting messages.
Enterprise CapabilityHas the ability to manage a large corporate network of users.None available used more for local development.
SharingAs long as the user can log in to the server
they can see dashboards they have access to.
Can be shared locally, the recipient must have Tableau installed
to see the visualisation.
EditingThe dashboard can only be edited on the server, with appropriate permissions.Local versions could be edited by users, thus you could end
up with multiple versions of the same dashboard.

So in summary:

In order to make a decision on which route to take, the following questions should be asked:

  • What is the size of your organisation?
  • Does your workforce need development capabilities?
  • How important is it to be able to manage your data? Does it need to be controlled centrally?
  • Can you benefit from scaling for:
    • Licensing costs.
    • Data management.
    • User management.
    • Dashboard distribution.
    • Scalability – Will your existing data be extensive and grow over time?

As a result a lot of the decision will come down to cost, the size of your data , distribution of dashboards and the no of users.

TypeError object of type ‘int’ has no len()

I have seen this data type error come up numerous times while working on my data analytics projects, and recently decided to investigate further. On initial inspection, it can seem a bit of a funny one, but in actual fact, it is quite straight forward.

Lets break it down and see what is going on

So in the below code, there are a number of things:

On line 1 we have a variable that is an integer. If we think about this logically, something that is a single numeric number cannot have a length.

An integer by design is purely to count up a number of apples or no of people, it cannot be viewed as having a length as it is descriptive of the number of occurrences of an object.

data = 100

Output Error:
<class 'int'>
Traceback (most recent call last):
  File "object of type int.py", line 3, in <module>
TypeError: object of type 'int' has no len()

So for it to in anyway allow a length to be calculated, the object needs to be one of the following data types:

  • List
  • String
  • Tuple
  • Dictionary

Opposite to an integer, these are datatypes that have values that would be more appropriate to having values that a length can be calculated on.

data = "100"
print("Length of string is: ", len(data))

data = [100,200,300]
print("Length of list is: ", len(data))

data = (100,200,300)
print("Length of tuple is: ", len(data))

data = {"Age": 1, "Name": 2}
print("Length of dictionary is: ", len(data))

And the output is:
<class 'str'>
Length of string is:  3
<class 'list'>
Length of list is:  3
<class 'tuple'>
Length of tuple is:  3
<class 'dict'>
Length of dictionary is:  2

In summary, to understand this error and fix it:

An integer describes the number of things that exist for an object, they are actually not the actual object in existence.

Anything that can have a length method applied to it actually exists and can be counted. In the above four examples, they are actually values that you could describe as existing as you can count each one of them.

The explanation here hopefully clears up the matter, if you have any questions leave a comment and I will answer for you!