how to join tables in SQL

Estimated reading time: 4 minutes

When running an sql select query, most likely we may have a need to bring in data from other tables.

The reason for this , is that tables have a specific purpose for the data they store.

One of the most important features in a data architecture, is to avoid duplication.

This has the impact of only storing data on a table that is required.

As a result joining tables is necessary if you want to get all the information you need .

As an illustration on a customer purchase , the purchase details would be on the purchase table.

BUT the customer details would not be stored there. They would be stored on the customer table.

To put it another way a customer may make many purchases, but you only need their name and address etc. stored once.

The object therefore of joining tables is to use this structure , and avoid duplication over many tables .

For example in the above table, if you had the customer name and address on the purchases table and customer table,

they would always have to be in sync, and it would bring up a maintenance headache for the database administrator.

So what was developed was primary and foreign keys, which helps join tables, based on unique similar values in a column in each table.

What are the types of joins that can be used?

Inner Join: An inner join is where you join two tables but only return the rows that match in both tables.

Left Join: A left join is where you return all the records from the left table, and any matched records from the right table.

Right Join: A right join is where you return all the records from the right table, and any matched records from the left table.

Full Join: A full join uses the left join and right join, and returns all the rows from both tables in the output.

Equally important, where there is no value found a NULL value will be returned.

Lets take some examples to explain the above concepts.

So we have two tables below:

Sales Database, Customer Table

Sales Database, Sales Table

Inner Join

If we run this code, we will return the below. For this reason the purpose of an Inner Join is to only return matched records common to both.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a -- left table
inner join sales.dbo.sales b -- right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Left Join

With a left join of the below code will return four rows. All the records from the left table and any matched records from the right table.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a --left table
left join sales.dbo.sales b --right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Right Join

With a right join of the below code will return three rows. All the records from the right table and any matched records from the left table.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a --left table
right join sales.dbo.sales b -- right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Full Join

Finally in order to run a full join, the result of the below code would be as per the below. The objective is to return all rows matching and unmatching.

In the output, there is a Null Value, meaning that that row has no value for INVOICE_NO which is a primary key.

In essence primary keys are not allowed to have null values.

select a.customer_no, a.customer_type, b.INVOICE_NO from dbo.sales b
full join sales.dbo.customer a
on a.CUSTOMER_NO = b.CUSTOMER_NO

What is a Primary Key and Foreign Key

When working with databases in your data analytics projects, most likely you will deal with more than one table. As a result in certain scenarios there will be a link between the tables

This will ensure that you can use data from the two or more tables that can be joined together. But how would you do that?

In order to join two tables, you will need columns in both tables that have values that are unique and of similar values.

In other words you need to create what is called a primary key, its characteristics are as follows:

  • Its values are unique.
  • They cannot be null.

So where does a foreign key come in? Say we have two tables Table A and Table B.

Table “Customer Table” will have a column that has a primary key call Customer_No

Table “Sales Table” also has a column with a primary key called Sales_No.

  • Now we cant join the two tables, because the primary keys are different, and will have different values
  • But in the Sales Table, we will have an additional column called Customer_No, and it will hold some of the unique values that are held in the Customer Table, Customer_No column.
  • This Customer_No in the Sales Table is called the foreign key. It is unique, has null values and can be connected to a primary key column Customer_No on the Customer Table.

In this way tables can be connected where values in columns are not always the primary key of a table.

So let’s look at tables in SQLite and see how this works in practice.

Below you have a Customer Table and the Sales Table

In both tables, we have a primary key, though they are not the same primary key. As a result, you would not be able to join the tables as the primary keys contained in them do not have the same values, and this is how you would normally join.

In saying that, the two tables can be joined as the foreign key of Sales is related to the primary key of Customer. If invoice_no was not on Sales , then you could make the customer_no in sales the primary key.

So lets look at the tables below with data in them.

Normally when you try to join two tables, it is on the primary key. On the below invoice_no is the primary key of the table Sales, and customer_no is the primary key of the customer table.

While their values in each column are unique, they are not the same , so a join would fail as per the below:

But if you change on the join b.invoice_no TO b.customer_no, it will now bring back all the values asked for. The reason?

The primary key on one is linked to the foreign key on the other, with similar unique values.

As a result, you can still have a primary key column, not in another table and join to that table as long as there is a foreign key column with the same values.

This helps to maintain the table structure, without having to remove primary keys.

how to create and drop a table in SQL

If you are working on data analytics , and have access to databases that you can load data into and manipulate, then read on.

There maybe a reason for you have your own partitioned table with data in it, but you need to know how to create a database.

The creation of a table can be achieved as follows;

  • Decide on the table name.
  • Within the table you will have columns, they need to be named.
  • Next for each column name a datatype needs to be decided upon.

It is important to note that the data type you decide is very important before you create the table.

This is because once you start loading data into the table if you decide to change the data type for a column, it could change the data contained within that column.

As a result a review of the data to be loaded should be completed, looking at the type of data, its length and any attributes of the data that need careful consideration.

Lets create our table and talk through the different aspects of the SQL used.

So in the below we have the create table statement

The SQL is broken down as follows:

  1. The first line creates the table within the schema. It has its own unique area in there.
  2. Lines 2,3,4 create the columns with their datatypes.

When this is run , the schema is updated, one thing to note it is empty, and requires an insert statement to populate the columns.

And if we do a select all from the table, it returns empty:

How to drop a table that you have created?

Sometimes you may have a need to drop tables from your schema.

Before proceeding there are a couple of things to consider:

  • Have you backed up the data contained within it?
  • You understand the deletion is permanent ( unless you can rebuild from backup)
  • Any batch runs that run off the table will fail once it is deleted.
  • Any tables that have queries that join to the table been removed will fail.
  • This might be part of a wider database cleanup and may improve performance.

Below is the one line of code ( yes you are reading that correct), that will delete the table.

In most roles the ability to drop tables will be restricted by the database administrator.

After the deletion it will look like this. As you can see the table no longer exists.

In conclusion the deletion of a database table is very easy to do , but comes with caveats as outlined above, so proceed with caution.

how to use case statements in SQL

In recent posts on SQL here on this website, we have covered off a number of ways to extract your data from wildcards to filters.

Here we are going to build on that learning, but bring in a new dimension in the use of case statements.

These are very handy to use when you may want to apply specific data values to a piece of data.

We have several posts on machine learning; in effect, this could be used to add data needed to run the model.

Lets start looking at the SQLite database we have been using for other posts.

The table we will work off is shown below, we have used in other posts.

What we are going to look at, is the column name, and assign a binary number, 1 or 0 if the string in this column begins with the letter “A”.

Case statements work on the basis when a value meets certain criteria.

They normally are constructed as follows:

CASE

… criteria you want to meet

End

The power of this is that you are starting to add additional attributes about the string in a new column.

An example could be if a column value is the incorrect length, can be used as a data quality check. See my example below.

Below in the output when we apply a value of 1 to a new column if it starts with “A”.

Important to remember that the new column country_check only shows in the output.

On the above we have just one case statements, but what if we want more than one in the query?

If we want to apply multiple case statements to a query, just add a new when line, with your criteria.

The wild card values below will return any value between 1-5, they can be any combination of 1-5.

What if we want to have multiple case statements on different columns?

As can be seen below we have selected two columns to apply our case statement to.

We just have added a check on the column iso_code.

As before the new column will not be saved to the database.

The video to show how this is applied can be found below:

In conclusion we have shown examples of how to apply case statements in SQL, with the following pointers:

  • Multiple case statements can be created.
  • Wildcards can be used to return different variations of your data.
  • Outputs are populated into a new column, though this is not saved to the table.
  • You can test for multiple scenarios.
  • Using case statements allows you to understand your data better with additional attributes.

We have lots of posts on this website that will help you build your data analytics skills.

How to use wildcards in SQL

Estimated reading time: 6 minutes

Following on from from our previous posts on SQL, this post will help to explain how to use wildcards in your query.

What would you use a wild card in the first place?

When a data analyst dealing with a large dataset, it is most likely that they will not know every piece of data.

As a result data will come from multiple sources and will be in different formats.

Using SQL wild cards will aid the programmer in been able to get specific pieces of data that may cause data quality errors.

Due to your query’s nature, you may not know where the problem is , the answer is to use wild cards for their ease and flexibility.

So lets look at a data set and start to apply some of the logic above , to a practical example.

We are going to use SLlite again, below is the table we are going to run our query off.

As you will see we have three columns with data in it, the examples below will work off the “name” column.

Name has a number of data points that are quite similar, so lets start showing you how to actually use the wild card.

Filter the data for all values before the last value using %l

The output below is basically going to the name column only and asking it to return values , that have “l” at the end.

What the SQL is instructed to do is to look at each string, and where there is an “l” at the end, and characters before it, then return those records.

This is what using wildcards does, the % basically is saying give me any value before “l”, which has to be at the end.

As none of the values have “l” at the end it returns blank, which is correct.

If we rerun this , with %y, we get four values returned:

Filter the data for all values that start with A%

As a follow on from above say you want to find records that begin with A, but you don’t know what comes after the “A”?

Below, correctly it returns only three, and it is not concerned what comes afterwards.

Filter the data for all values where a “g” is in the middle?

In the above we looked at the start and end points of the string, and it return records that matched the criteria.

There maybe a scenario where you want to look for records, with a particular value that may be in the middle of the string.

In this example, we know that “g” occurs at the fourth position, so it will return all records where g is in that position, regardless of what is on either side.

In applying %%% it is basically saying return anything, if the fourth character which is g, irrespective of what is in the previous three characters.

Filter the data for all values where there is a space in the record

There are going to be records that have spaces in them, and sometimes that may or may not be wanted.

In order to find those records, we would apply the below wildcard in the SQL

Filter the data for all values start with an “H” and end with a “y”

In a dataset, you may want to find records that begin and end with specific values, but you are not sure or bothered what is in between.

Below we have changed the “%” for “_” in the query. This change allows us to ask for a start and end character.

Something to note, between the “H” and “y” there are five underscores (_) in there. Each one represents the no of values between the first and last character. If the string was only three letters long, then you would use one _ and so on.

Summary and conlusion

In this post, we have described what a wild card is and its uses. They are very handy for searching for a combination of value or values when you are not sure what else is in the string.

This is quite commonly used in pattern searching, and in data cleansing , most systems would incorporate it especially if automating tasks , it allows clean data to process without it coming to a stand still.

On our YouTube channel you can subscribe to find out more information about data cleansing, SQL and lots of different tips and techniques, below is the video for this post:

A list of wild card operators are as follows:

Wild cardDescription
%Either before or after a character, represents any character that could appear but is unknown.
_This is a single character of any value that may appear in a wildcard search, represents a space between characters.
^Inside brackets beside characters, tells the program to not return those characters.
Inside a bracket and between characters, represents the range to be found of the characters it is in between.
[]If you place characters inside this bracket, it requests the program to return any of those characters in the output.

We have lots of posts on this website that will help you build your data analytics skills.

select rows with a certain value using SQL

Estimated reading time: 2 minutes

You have now started to understand your database structure and want to extract data from it.

Data analytics projects can have lots of data, you need a way to get to what you need quickly and easily.

The question now becomes how would you achieve this?

The most easiest way is to add a where clause into your select statement.

On the below SQLite database, we are going to retrieve certain records from the name column, using our sql query.

The objective is that when we run the SQL code , it only returns the records we asked it to show us.

Writing the SQL code to return all rows where name = Ireland

Using the below code to extract our data

select * from error_dbase.country
where name = "Ireland"

it returns the following from the database

But what if we want to return more than one value?

Then you would just use an “in” statement, the SQL code would be changed as follows:

select * from error_dbase.country
where name in ("Ireland","Italy")

yielding a result as follows:

and this would also work if we had more than one of the same value as follows:

So to recap, we can choose one value on its own , using the “where” clause

or

we can use the “in” clause to ask the code to return records of multiple selections.

On our YouTube channel you can subscribe to find out more information about SQL, SQLite database and many more tips and tricks to help you with your Data Analytics.

To see a video tutorial explaining the concepts above, see below:

how to select columns with SQL

Estimated reading time: 2 minutes

Building on how to select all records with SQL , you may want to learn how to get specific columns of data from your table.

Using SQLite again , we will now look to select two columns from the table below, namely name and iso_code.

The purpose of completing this, is to allow the user to start to only choose the data they want.

As a database becomes bigger and more complex, so does the ability to retrieve the data.

Furthermore the speed and turnaround time becomes more critical.

What is more if these queries are been used for real time applications:

  • These results need to be precise to what is needed.
  • Additionally the speed of them returning back the records needs to be quick.

Subsequently below is the query run off the above table that we will use:

As you can see it has returned the two columns we asked for, excluding the description column.

To conclude, one thing to note this returns all records, that currently exist on the table for those two columns.

To see a video tutorial on this, watch the full video below:

how to select all records with SQL

Estimated reading time: 2 minutes

As part of working as a data analyst, in order to retrieve your data you will most likely need to connect into a database, and look at the database records and then start analysing, but how?

At the start of any data analytics project you will need to understand the structure of your data and what are the issues within it.

First of all you will need to connect to your database, and then open an SQL editor.

In this example we are going to use SQLite, it is very popular and can be downloaded from here .

Once you have this downloaded, you will need to create your database, tables and records.

Today we are going to retrieve records from a table that I have already created, as per the screenshot below

So that is the able above, but how would I use SQL to retrieve all its records?

In esseence it is very straight forward, all you need to do is run the following code:

select * from error_dbase.country

and it will provide you with the following output

and there you go, now you have retrieved all the records from the database table!

To see a video tutorial on this, watch the full video below:

create read update delete using Tkinter

Estimated reading time: 3 minutes

CRUD ( create, read, update, delete) is a programming language acronym notably for how to manage updates to a database.

It is commonly used when talking about storing data on a database and follows the following rules:

  • The ability to create or add new records.
  • Be able to read and or retrieve the records.
  • If an update is needed, then allow those updates to be posted to the database successfully.
  • To ensure records are maintained correctly and deleted where a delete request is requested.

Generally speaking, with crud operations, it is related to persistent storage principles, more information can be found here basic functions of persistent storage (datacadamia.com)

Given these points when using graphical user interfaces, and trying to introduce crud functionality to your database applications, together with the design of the application, should yield:

  • Good database design.
  • Reducing complexity.
  • Eliminate duplication.
  • Have consistency.

We have spoken about the use of CRUD, but what are its benefits?

Security roles can be enabled

In addition, CRUD brings they bring structure to what actually can happen on a server, in essence, the ability to apply updates is managed tightly.

Helps put structure around what an application can do

Equally important, in building an application for use by users, knowing what they will do can prompt the designers to ask questions about what the user will actually do.

For example probably when Twitter was been designed, the things that probably where thought of :

Create a tweet – Functionality for the user to create and post a tweet, that gets saved to the database.

Read a tweet – Load all tweets from the database to the users interface.

Update – Allow a user to update their account profile or tweet timeline.

Delete – A user can delete their own tweets, their profile and or account.

Data flows between servers can be managed easier

Similarly, with the modern use of technology and processes moving online, data flows around between lots of people and organizations.

With this in mind, the need to send and receive data has to be managed more efficiently and securely.

As a result, data coming into the server can be controlled as to where it is received and updated. This can be accomplished by the security roles discussed above.

Using classes with CRUD

The use of Python classes is commonly seen in many applications, for this reason we have used them again below.

Uniquely classes will only help to enhance your computer programme and organization of code, as a result of removing duplication and simplifying the code.

Classes are a very useful way to manage the structure of your code, thus keeping everything centralized.

Sooner or later if this was not implemented the project would become too difficult to manage, and maintenance and updates would become difficult to manage.

In the below video we take you through the steps involved in applying this methodology, using an SQLite database.

We use Python Classes to manage the different requests by the user, the details can be found here

When applying these updates, we have written the code that will apply the following SQL updates:

  • Select
  • Update
  • Insert
  • Delete

As can be seen these four SQL commands are the commonly used across any application to perform these requests in a CRUD application.

It is also important that your data types are synced between your Tkinter application and your SQL logic.

On our YouTube channel you can subscribe to find out more information about SQLite, SQL, Tkinter and many more tips and tricks!