how to write subqueries in SQL

In SQL a lot of the times there will be a need to write subqueries that will extract data that you can analyse.

But what if you want to use the data as input to another query that can then be used to give you the result you desire?

This is is the power of using a subquery, the subquery can gather the information you need, the main query will then provide the output.

But why would somebody approach it this way?

(A) Complex SQL and joins are used in the sub query.

(B) The main query that uses the sub query , summaries the data.

(C) Lots of data is returned in the subquery, the programmer chooses out of it what they want.

(D) Flexibility – You can adjust the main query to pull out what you need from it, not everything is required to be used, but can be quickly if a decision is made to utilise that data.

(E) As long as the subquery does not need to be changed at all, it can reduce maintenance and coding required to keep it up to date.

So what are the different ways we can use sub queries?

We will use these table values to return output that will allow us to provide output for analysis.

CUSTOMER

SALES

Selecting specific values that you only want to look at

In the below code the second select statement is the subquery. In this instance, we are using the output of that as an input to the first query on the left-hand side of the equals sign.

This therefore allows us to alter the right hand side for the values we want to return in the output, by just adding customer nos.

Note by adding customer nos we will need to change the equals sign, that will be covered off in the next section

select * FROM DBO.SALES
where CUSTOMER_NO = (select CUSTOMER_NO FROM dbo.CUSTOMER WHERE CUSTOMER_NO = '123456' )

Giving the below output:

Returning more than one value from the sub query

In the above section, we where focused on returning one value. That used the operator “equals”.

When we have more than one value to return that will not work, hence we use the like operator in the sub query.

As a result of this, the sub query will now return more than one value, this is the desired output we are looking for

select * FROM DBO.SALES
where CUSTOMER_NO IN (select CUSTOMER_NO FROM dbo.CUSTOMER WHERE INVOICE_NO LIKE 'INV%')

The output now that will be produced is as follows:

As can be seen the subquery returns through the wild card ‘INV%’ all those rows with that value. Then the main query will match all the customers it has with the customer nos returned in the subquery to produce the final output.

Use a subquery to UPDATE values

There may be a scenario where a subquery is used to update values on a table.

The subquery below checks for a NULL INVOICE_NO and the customer no = 654654, and if both are true, then the main query updates the table column INVOICE_NO value to ‘ERROR’ where CUSTOMER_NO = 654654.

SELECT * FROM dbo.SALES;

UPDATE dbo.Sales
SET INVOICE_NO = 'ERROR'
WHERE CUSTOMER_NO in (SELECT CUSTOMER_NO FROM dbo.SALES where INVOICE_NO is null AND CUSTOMER_NO = 654654);


SELECT * FROM dbo.SALES;

Before the update is applied:

After the update is applied:

Use a subquery to DELETE values

The final use of a sub query we will use is to look at a scenario where we want to delete values from the database table.

If we look at the above output from the previous section, we will see that the below code should remove rows 5-10 and 12.

Here we will use:

DELETE FROM dbo.Sales
WHERE CUSTOMER_NO in (SELECT CUSTOMER_NO FROM dbo.SALES where INVOICE_NO is null);

SELECT * FROM dbo.SALES;

And this will give us the following updated table:

To summarise the sections above:

(A) Using a sub query has many applications, and its use is varied.

(B) Using a sub query can be useful to help altering table values.

(C) Sub queries can be as complex or as simple as you want to make them.

Deleting table records with SQL

Estimated reading time: 4 minutes

Often you will hear about deleting table records with SQL, but how is this achieved, if you have not done it before?

In previous videos on SQL we discussed how to update records in SQL , how to insert data into a table in SQL.

You could be faced with the scenario where there is data that you no longer need, as a result, it needs to be deleted.

So lets work through three different scenarios as follows:

(A) Delete a record based on a unique identifier, in this instance CUSTOMER_NO.

(B)Delete multiple customer records at once using CUSTOMER_NO as the column that will identify the records.

(C) Delete a record where it equals a particular column value.

Our initial dataset will look like this:

This image has an empty alt attribute; its file name is image-14.png

Delete a record based on a unique identifier

DELETE FROM dbo.CUSTOMER WHERE CUSTOMER_NO = 987981

select * from dbo.CUSTOMER

When the above code is run it gives the following output.

As can be seen it now has ten records and the row for CUSTOMER_NO = 987981 has been removed.

This image has an empty alt attribute; its file name is image-15.png

Delete multiple customer records at once

In the above scenario we have deleted just one record, but most likely you will want to remove more than one.

This code below will complete this at once. Remember the column we are using is a primary key column, hence all the values are unique.

It may be the case that you are doing to have to do this on a column that does not have unique values, the code will still work.

I would caution though that using the primary key column value always allows you to only remove the ones that are not needed.

It could be the case that there are genuine values that need to remain, even if they are duplicated.

DELETE FROM dbo.CUSTOMER WHERE CUSTOMER_NO in(3,25897458,25898633)

select * from dbo.CUSTOMER
This image has an empty alt attribute; its file name is image-16.png

As can be seen running the above code now leaves us with seven rows, down from ten and the three rows we had in the SQL have been removed.

Deleting a record where it equals a particular column value

So our final scenario is where we need to find a particular value in a column , and it is exists , then delete the whole record for that row.

Here we are not relying on the primary key, but basically a string search in the customer_type column.

When this SQL is run, it supposed remove two rows and leave us with five records.

The problem that you will encounter

There will be a problem with this as the primary key , even though is not been referenced will throw an error.

The error in this instance relates to the the CUSTOMER_NO on the table customer

  • It has a foreign key equivalent column in the sales table.

In essence you cant leave out the primary key value when completing this SQL statement.

As a result both tables need to be updated, where the primary key value exists.

DELETE FROM dbo.CUSTOMER WHERE customer_type = 'POSTAL'

select * from dbo.CUSTOMER
** the above code will fail with the below error:
The DELETE statement conflicted with the REFERENCE constraint "FK__SALES__CUSTOMER___267ABA7A". The conflict occurred in database "SALES", table "dbo.SALES", column 'CUSTOMER_NO'.
The statement has been terminated.

Our table in sales looks like this:

This image has an empty alt attribute; its file name is image-17.png

Updating our logic as follows:

DELETE FROM dbo.SALES
where SALES.CUSTOMER_NO = 123458 ---> This line needs to be run first as the next line checks if this exists as a foreign key.
DELETE FROM dbo.CUSTOMER
where CUSTOMER.CUSTOMER_NO = 123458
and customer_type = 'POSTAL'


select * from dbo.CUSTOMER
select * from dbo.SALES

Results in:

This image has an empty alt attribute; its file name is image-18.png

To summarise there are a number of things to consider:

  • It is important to always check if there is a primary and foreign key relationship.
  • Specifically not checking these relations will result in the SQL outputting an error.
  • Under these circumstances a review of all tables before proceeding is advised.
  • Ensure you have your tables backed up before doing any deletions!

how to update records in SQL

Estimated reading time: 3 minutes

There are multiple reasons for wanting to update records in a database table, from data quality to enriching your data for any machine learning projects.

In this blog post we will look at how to update database records, and also where you try to apply updates and they cannot be completed.

The where clause will become important as it allows you to identify specific records.

It will be updating the records below:

Updating a single record

--Update one record on its own
UPDATE dbo.CUSTOMER
SET customer_type = 'WEB'
WHERE CUSTOMER_NO = 125872;

Running the above code gives this output, resulting in the value been updated per our SQL statement.

Updating multiple rows at once

There maybe scenarios where you want to have multiple records/rows updated at once, the below code will provide you with an example of how this can be achevied:

--Update multiple rows at once
UPDATE dbo.CUSTOMER
SET customer_type = 'WEB'
WHERE CUSTOMER_NO in(333333,456789)

This will now give you updated values in the output below. It is a very handy way to at once get multiple records updated.

Show an update that is valid but will not change anything

--Show an update request that does not complete
UPDATE dbo.CUSTOMER
SET CUSTOMER_NO = 2,customer_type = 'ERROR'
WHERE CUSTOMER_NO = 456789222 ---> will not execute as this value does not exist, it ignores.

With the above code, we have a valid update statement. In this scenario it will not complete as the CUSTOMER_NO does not exist on the database table.

In essence the output is exactly the same as in the previous example.

Update multiple column values

In our final example here, we are going to show how to update the CUSTOMER_No and the customer_type values.

The below SQL will change both values, but it is important to note that as the CUSTOMER_NO is a primary key, it cannot be changed to a number that already exists as primary keys can’t be null or have duplicates.

UPDATE dbo.CUSTOMER
SET CUSTOMER_NO = 3,customer_type = 'ERROR'
WHERE CUSTOMER_NO = 333333

So to wrap up there are a number of ways to update your database table. Here we have introduced some examples. There are also some additional steps you could take:

  1. Count all the records before and after to ensure they remain the same.
  2. Create a copy of the table before updates are applied, and then compare that to the updated table. The differences should be what you expect to see.

What is a Primary Key and Foreign Key

When working with databases in your data analytics projects, most likely you will deal with more than one table. As a result in certain scenarios there will be a link between the tables

This will ensure that you can use data from the two or more tables that can be joined together. But how would you do that?

In order to join two tables, you will need columns in both tables that have values that are unique and of similar values.

In other words you need to create what is called a primary key, its characteristics are as follows:

  • Its values are unique.
  • They cannot be null.

So where does a foreign key come in? Say we have two tables Table A and Table B.

Table “Customer Table” will have a column that has a primary key call Customer_No

Table “Sales Table” also has a column with a primary key called Sales_No.

  • Now we cant join the two tables, because the primary keys are different, and will have different values
  • But in the Sales Table, we will have an additional column called Customer_No, and it will hold some of the unique values that are held in the Customer Table, Customer_No column.
  • This Customer_No in the Sales Table is called the foreign key. It is unique, has null values and can be connected to a primary key column Customer_No on the Customer Table.

In this way tables can be connected where values in columns are not always the primary key of a table.

So let’s look at tables in SQLite and see how this works in practice.

Below you have a Customer Table and the Sales Table

In both tables, we have a primary key, though they are not the same primary key. As a result, you would not be able to join the tables as the primary keys contained in them do not have the same values, and this is how you would normally join.

In saying that, the two tables can be joined as the foreign key of Sales is related to the primary key of Customer. If invoice_no was not on Sales , then you could make the customer_no in sales the primary key.

So lets look at the tables below with data in them.

Normally when you try to join two tables, it is on the primary key. On the below invoice_no is the primary key of the table Sales, and customer_no is the primary key of the customer table.

While their values in each column are unique, they are not the same , so a join would fail as per the below:

But if you change on the join b.invoice_no TO b.customer_no, it will now bring back all the values asked for. The reason?

The primary key on one is linked to the foreign key on the other, with similar unique values.

As a result, you can still have a primary key column, not in another table and join to that table as long as there is a foreign key column with the same values.

This helps to maintain the table structure, without having to remove primary keys.

How to data cleanse a database table

In Data Analytics, that is a very relevant question, and something I look to implement in most projects, sometimes it is too easy to click the shortcut icon to your spreadsheet application!

Here we are looking to bring a little automation into these videos. Building on How to import data from files and Removing characters from an imported CSV file this video connects to a Microsoft Azure cloud database table, brings in the data with errors on it, fixes the errors and displays the correct output on the screen.

What can this do for organisations?

There are several benefits to automating this step:

  • Less manual intervention if there is a need to fix data issues.
  • Better productivity.
  • Better data flows with no errors and quicker reporting.

 

Moving away from files

The process of moving away from files and into automation has several steps:

  • Be clear on your data needs.
  • Understand what you are trying to achieve.
  • Build a process that is repeatable but can be updated easily.
  • Ensure that you build in data quality checks, helps deliver the better output to the users.

Thanks for stopping by!

Data Analytics Ireland