how to write subqueries in SQL

In SQL a lot of the times there will be a need to write subqueries that will extract data that you can analyse.

But what if you want to use the data as input to another query that can then be used to give you the result you desire?

This is is the power of using a subquery, the subquery can gather the information you need, the main query will then provide the output.

But why would somebody approach it this way?

(A) Complex SQL and joins are used in the sub query.

(B) The main query that uses the sub query , summaries the data.

(C) Lots of data is returned in the subquery, the programmer chooses out of it what they want.

(D) Flexibility – You can adjust the main query to pull out what you need from it, not everything is required to be used, but can be quickly if a decision is made to utilise that data.

(E) As long as the subquery does not need to be changed at all, it can reduce maintenance and coding required to keep it up to date.

So what are the different ways we can use sub queries?

We will use these table values to return output that will allow us to provide output for analysis.

CUSTOMER

SALES

Selecting specific values that you only want to look at

In the below code the second select statement is the subquery. In this instance, we are using the output of that as an input to the first query on the left-hand side of the equals sign.

This therefore allows us to alter the right hand side for the values we want to return in the output, by just adding customer nos.

Note by adding customer nos we will need to change the equals sign, that will be covered off in the next section

select * FROM DBO.SALES
where CUSTOMER_NO = (select CUSTOMER_NO FROM dbo.CUSTOMER WHERE CUSTOMER_NO = '123456' )

Giving the below output:

Returning more than one value from the sub query

In the above section, we where focused on returning one value. That used the operator “equals”.

When we have more than one value to return that will not work, hence we use the like operator in the sub query.

As a result of this, the sub query will now return more than one value, this is the desired output we are looking for

select * FROM DBO.SALES
where CUSTOMER_NO IN (select CUSTOMER_NO FROM dbo.CUSTOMER WHERE INVOICE_NO LIKE 'INV%')

The output now that will be produced is as follows:

As can be seen the subquery returns through the wild card ‘INV%’ all those rows with that value. Then the main query will match all the customers it has with the customer nos returned in the subquery to produce the final output.

Use a subquery to UPDATE values

There may be a scenario where a subquery is used to update values on a table.

The subquery below checks for a NULL INVOICE_NO and the customer no = 654654, and if both are true, then the main query updates the table column INVOICE_NO value to ‘ERROR’ where CUSTOMER_NO = 654654.

SELECT * FROM dbo.SALES;
UPDATE dbo.Sales
SET INVOICE_NO = 'ERROR'
WHERE CUSTOMER_NO in (SELECT CUSTOMER_NO FROM dbo.SALES where INVOICE_NO is null AND CUSTOMER_NO = 654654);
SELECT * FROM dbo.SALES;

Before the update is applied:

After the update is applied:

Use a subquery to DELETE values

The final use of a subquery we will use is to look at a scenario where we want to delete values from the database table.

If we look at the above output from the previous section, we will see that the below code should remove rows 5-10 and 12.

Here we will use:

DELETE FROM dbo.Sales
WHERE CUSTOMER_NO in (SELECT CUSTOMER_NO FROM dbo.SALES where INVOICE_NO is null);
SELECT * FROM dbo.SALES;



Have you seen these great posts?

how to groupby in a select statement
how to update records in sql
Deleting table records in SQL

How to Group By in a Select Statement

SQL group by is one of the most commonly used statements you will come across and it has many different applications.

It is a great way to aggregate data, when you are using one of the following in the select statement:

(A) Sum

(B)Count

(C) Average

(D) Max

(E) Min

There are more that can be added to this list , we just gave you some examples.

So lets look at some areas where it might be applied. We will use the below tables from our database to apply the group by to:

CUSTOMER

SALES

Group by using count

In the below code we are applying a very straight forward group by, and it is on the customer_type column.

To make the output easier to follow, we have also applied an alias to the count column. This makes it easier to see the output.

Also this column is not stored in the database, so you would need to apply additional logic to accomplish the storage of that data.

select count(customer_type) as "count of customer type",customer_type from dbo.CUSTOMER
group by customer_type;

Giving output:

If we remove the count from the above SQL, it simply will show the below output:

Group by using joins

Here we are joining tables, and then using a group by. It is quite similar to the first query, but it now brings in additional information, namely the invoice_no.

In addition now the data has better meaning. For example in the group by using count above , the row 6 had six values for WEB.

Now that line is broken out further , as one of the WEB records has an invoice value, it provides a better understanding of the data.

select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
--where b.invoice_no is null
group by a.customer_type,b.invoice_no

Giving output:

Group by including case statements

In the next scenario, we will see how to use case statements in SQL.

The inclusion will allow additional information to be included in the output.

As can be seen the group by is not impacted by this, but the good thing is that if you are building a machine learning model, this will include great additional information.

select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no,
case
when a.customer_type = 'EMAIL' then 1 
when a.customer_type = 'POSTAL' then 2
when a.customer_type = 'WEB' then 3
ELSE 0
end as Cust_Type_No
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
group by a.customer_type,b.invoice_no

Which will give:

Group by using a sub query

The final approach that we will look at is where you are using a sub query.

The purpose of a subquery is to get the information you want, and then you can pick data out of it as you see fit.

In the below SQL we can see that aggregates further the count of the fields. If we ran the subquery on its own, then we would get:

But running the below now aggregates up further, it essentially is counting the no of occurences of each value in the “Cust_Type_No” column.

Then it adds beside for each, the values that are associated with it from the customer_type column.

select count(tmp.Cust_Type_No) as "Count_cust_type", tmp.customer_type from (
select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no,
case
when a.customer_type = 'EMAIL' then 1 
when a.customer_type = 'POSTAL' then 2
when a.customer_type = 'WEB' then 3
ELSE 0
end as Cust_Type_No
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
group by a.customer_type,b.invoice_no) as tmp
group by tmp.Cust_Type_No,tmp.customer_type

The output which is:

So in summary here are some points to look out for:

(A) Use group by with the right aggregate function, so you get the right output.

(B) You can add in additional logic without impacting the group by, examples here include case statements.

(C) Group by can be added to sub queries as well as the main query you are extracting data from.

See other posts that may interest you!

how to write subqueries in sql

how to update records in sql

how to insert data into a table in SQL

how to join tables in SQL

how to join tables in SQL

Estimated reading time: 4 minutes

When running an sql select query, most likely we may have a need to bring in data from other tables.

The reason for this , is that tables have a specific purpose for the data they store.

One of the most important features in a data architecture, is to avoid duplication.

This has the impact of only storing data on a table that is required.

As a result joining tables is necessary if you want to get all the information you need .

As an illustration on a customer purchase , the purchase details would be on the purchase table.

BUT the customer details would not be stored there. They would be stored on the customer table.

To put it another way a customer may make many purchases, but you only need their name and address etc. stored once.

The object therefore of joining tables is to use this structure , and avoid duplication over many tables .

For example in the above table, if you had the customer name and address on the purchases table and customer table,

they would always have to be in sync, and it would bring up a maintenance headache for the database administrator.

So what was developed was primary and foreign keys, which helps join tables, based on unique similar values in a column in each table.

What are the types of joins that can be used?

Inner Join: An inner join is where you join two tables but only return the rows that match in both tables.

Left Join: A left join is where you return all the records from the left table, and any matched records from the right table.

Right Join: A right join is where you return all the records from the right table, and any matched records from the left table.

Full Join: A full join uses the left join and right join, and returns all the rows from both tables in the output.

Equally important, where there is no value found a NULL value will be returned.

Lets take some examples to explain the above concepts.

So we have two tables below:

Sales Database, Customer Table

Sales Database, Sales Table

Inner Join

If we run this code, we will return the below. For this reason the purpose of an Inner Join is to only return matched records common to both.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a -- left table
inner join sales.dbo.sales b -- right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Left Join

With a left join of the below code will return four rows. All the records from the left table and any matched records from the right table.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a --left table
left join sales.dbo.sales b --right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Right Join

With a right join of the below code will return three rows. All the records from the right table and any matched records from the left table.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a --left table
right join sales.dbo.sales b -- right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Full Join

Finally in order to run a full join, the result of the below code would be as per the below. The objective is to return all rows matching and unmatching.

In the output, there is a Null Value, meaning that that row has no value for INVOICE_NO which is a primary key.

In essence primary keys are not allowed to have null values.

select a.customer_no, a.customer_type, b.INVOICE_NO from dbo.sales b
full join sales.dbo.customer a
on a.CUSTOMER_NO = b.CUSTOMER_NO

How to use wildcards in SQL

Estimated reading time: 6 minutes

Following on from from our previous posts on SQL, this post will help to explain how to use wildcards in your query.

What would you use a wild card in the first place?

When a data analyst dealing with a large dataset, it is most likely that they will not know every piece of data.

As a result data will come from multiple sources and will be in different formats.

In some instances when you select rows using certain values using SQL, this will work as you know the exact values.

Using SQL wild cards will aid the programmer in being able to get specific pieces of data that may cause data quality errors, but they may not know where the problem is, or what makes up the error.

So lets look at a data set and start to apply some of the logic above , to a practical example.

We are going to use SLlite again, below is the table we are going to run our query off.

As you will see we have three columns with data in it, the examples below will work off the “name” column.

Name has a number of data points that are quite similar, so lets start showing you how to actually use the wild card.

Filter the data for all values before the last value using %l

The output below is basically going to the name column only and asking it to return values , that have “l” at the end.

What the SQL is instructed to do is to look at each string, and where there is an “l” at the end, and characters before it, then return those records.

This is what using wildcards does, the % basically is saying give me any value before “l”, which has to be at the end.

As none of the values have “l” at the end it returns blank, which is correct.

If we rerun this , with %y, we get four values returned:

Filter the data for all values that start with A%

As a follow on from above say you want to find records that begin with A, but you don’t know what comes after the “A”?

Below, correctly it returns only three, and it is not concerned what comes afterwards.

Filter the data for all values where a “g” is in the middle?

In the above we looked at the start and end points of the string, and it return records that matched the criteria.

There maybe a scenario where you want to look for records, with a particular value that may be in the middle of the string.

In this example, we know that “g” occurs at the fourth position, so it will return all records where g is in that position, regardless of what is on either side.

In applying %%% it is basically saying return anything, if the fourth character which is g, irrespective of what is in the previous three characters.

Filter the data for all values where there is a space in the record

There are going to be records that have spaces in them, and sometimes that may or may not be wanted.

In order to find those records, we would apply the below wildcard in the SQL

Filter the data for all values start with an “H” and end with a “y”

In a dataset, you may want to find records that begin and end with specific values, but you are not sure or bothered what is in between.

Below we have changed the “%” for “_” in the query. This change allows us to ask for a start and end character.

Something to note, between the “H” and “y” there are five underscores (_) in there. Each one represents the no of values between the first and last character. If the string was only three letters long, then you would use one _ and so on.

Summary and conlusion

In this post, we have described what a wild card is and its uses. They are very handy for searching for a combination of value or values when you are not sure what else is in the string.

This is quite commonly used in pattern searching, and in data cleansing , most systems would incorporate it especially if automating tasks , it allows clean data to process without it coming to a stand still.

On our YouTube channel you can subscribe to find out more information about data cleansing, SQL and lots of different tips and techniques, below is the video for this post:

A list of wild card operators are as follows:

Wild cardDescription
%Either before or after a character, represents any character that could appear but is unknown.
_This is a single character of any value that may appear in a wildcard search, represents a space between characters.
^Inside brackets beside characters, tells the program to not return those characters.
Inside a bracket and between characters, represents the range to be found of the characters it is in between.
[]If you place characters inside this bracket, it requests the program to return any of those characters in the output.

We have lots of posts on this website that will help you build your data analytics skills.

how to select columns with SQL

Estimated reading time: 2 minutes

Building on how to select all records with SQL , you may want to learn how to get specific columns of data from your table.

Using SQLite again , we will now look to select two columns from the table below, namely name and iso_code.

The purpose of completing this, is to allow the user to start to only choose the data they want.

As a database becomes bigger and more complex, so does the ability to retrieve the data.

Furthermore the speed and turnaround time becomes more critical.

What is more if these queries are been used for real time applications:

  • These results need to be precise to what is needed.
  • Additionally the speed of them returning back the records needs to be quick.

Subsequently below is the query run off the above table that we will use:

As you can see it has returned the two columns we asked for, excluding the description column.

To conclude, one thing to note this returns all records, that currently exist on the table for those two columns.

To see a video tutorial on this, watch the full video below:

how to select all records with SQL

Estimated reading time: 2 minutes

As part of working as a data analyst, in order to retrieve your data you will most likely need to connect into a database, and look at the database records and then start analysing, but how?

At the start of any data analytics project you will need to understand the structure of your data and what are the issues within it.

First of all you will need to connect to your database, and then open an SQL editor.

In this example we are going to use SQLite, it is very popular and can be downloaded from here .

Once you have this downloaded, you will need to create your database, tables and records.

Today we are going to retrieve records from a table that I have already created, as per the screenshot below

So that is the able above, but how would I use SQL to retrieve all its records?

In esseence it is very straight forward, all you need to do is run the following code:

select * from error_dbase.country

and it will provide you with the following output

and there you go, now you have retrieved all the records from the database table!

To see a video tutorial on this, watch the full video below: