how to use case statements in SQL

In recent posts on SQL here on this website, we have covered off a number of ways to extract your data from wildcards to filters.

Here we are going to build on that learning, but bring in a new dimension in the use of case statements.

These are very handy to use when you may want to apply specific data values to a piece of data.

We have several posts on machine learning; in effect, this could be used to add data needed to run the model.

Lets start looking at the SQLite database we have been using for other posts.

The table we will work off is shown below, we have used in other posts.

What we are going to look at, is the column name, and assign a 1 or 0 if the string in this column begins with the letter “A”.

Case statements work on the basis when a value meets certain criteria.

They normally are constructed as follows:

CASE

… criteria you want to meet

End

The power of this is that you are starting to add additional attributes about the string in a new column.

An example could be if a column value is the incorrect length, can be used as a data quality check. See my example below.

Below in the output when we apply a value of 1 to a new column if it starts with “A”.

Important to remember that the new column country_check only shows in the output.

On the above we have just one case statements, but what if we want more than one in the query?

If we want to apply multiple case statements to a query, just add a new when line, with your criteria.

The wild card values below will return any value between 1-5, they can be any combination of 1-5.

What if we want to have multiple case statements on different columns?

As can be seen below we have selected two columns to apply our case statement to.

We just have added a check on the column iso_code.

As before the new column will not be saved to the database.

The video to show how this is applied can be found below:

In conclusion we have shown examples of how to apply case statements in SQL, with the following pointers:

  • Multiple case statements can be created.
  • Wildcards can be used to return different variations of your data.
  • Outputs are populated into a new column, though this is not saved to the table.
  • You can test for multiple scenarios.
  • Using case statements allows you to understand your data better with additional attributes.

We have lots of posts on this website that will help you build your data analytics skills.

How to use wildcards in SQL

Estimated reading time: 6 minutes

Following on from from our previous posts on SQL, this post will help to explain how to use wildcards in your query.

What would you use a wild card in the first place?

When a data analyst dealing with a large dataset, it is most likely that they will not know every piece of data.

As a result data will come from multiple sources and will be in different formats.

Using SQL wild cards will aid the programmer in been able to get specific pieces of data that may cause data quality errors.

Due to your query’s nature, you may not know where the problem is , the answer is to use wild cards for their ease and flexibility.

So lets look at a data set and start to apply some of the logic above , to a practical example.

We are going to use SLlite again, below is the table we are going to run our query off.

As you will see we have three columns with data in it, the examples below will work off the “name” column.

Name has a number of data points that are quite similar, so lets start showing you how to actually use the wild card.

Filter the data for all values before the last value using %l

The output below is basically going to the name column only and asking it to return values , that have “l” at the end.

What the SQL is instructed to do is to look at each string, and where there is an “l” at the end, and characters before it, then return those records.

This is what using wildcards does, the % basically is saying give me any value before “l”, which has to be at the end.

As none of the values have “l” at the end it returns blank, which is correct.

If we rerun this , with %y, we get four values returned:

Filter the data for all values that start with A%

As a follow on from above say you want to find records that begin with A, but you don’t know what comes after the “A”?

Below, correctly it returns only three, and it is not concerned what comes afterwards.

Filter the data for all values where a “g” is in the middle?

In the above we looked at the start and end points of the string, and it return records that matched the criteria.

There maybe a scenario where you want to look for records, with a particular value that may be in the middle of the string.

In this example, we know that “g” occurs at the fourth position, so it will return all records where g is in that position, regardless of what is on either side.

In applying %%% it is basically saying return anything, if the fourth character which is g, irrespective of what is in the previous three characters.

Filter the data for all values where there is a space in the record

There are going to be records that have spaces in them, and sometimes that may or may not be wanted.

In order to find those records, we would apply the below wildcard in the SQL

Filter the data for all values start with an “H” and end with a “y”

In a dataset, you may want to find records that begin and end with specific values, but you are not sure or bothered what is in between.

Below we have changed the “%” for “_” in the query. This change allows us to ask for a start and end character.

Something to note, between the “H” and “y” there are five underscores (_) in there. Each one represents the no of values between the first and last character. If the string was only three letters long, then you would use one _ and so on.

Summary and conlusion

In this post, we have described what a wild card is and its uses. They are very handy for searching for a combination of value or values when you are not sure what else is in the string.

This is quite commonly used in pattern searching, and in data cleansing , most systems would incorporate it especially if automating tasks , it allows clean data to process without it coming to a stand still.

On our YouTube channel you can subscribe to find out more information about data cleansing, SQL and lots of different tips and techniques, below is the video for this post:

A list of wild card operators are as follows:

Wild cardDescription
%Either before or after a character, represents any character that could appear but is unknown.
_This is a single character of any value that may appear in a wildcard search, represents a space between characters.
^Inside brackets beside characters, tells the program to not return those characters.
Inside a bracket and between characters, represents the range to be found of the characters it is in between.
[]If you place characters inside this bracket, it requests the program to return any of those characters in the output.

We have lots of posts on this website that will help you build your data analytics skills.

select rows with a certain value using SQL

Estimated reading time: 2 minutes

You have now started to understand your database structure and want to extract data from it.

Data analytics projects can have lots of data, you need a way to get to what you need quickly and easily.

The question now becomes how would you achieve this?

The most easiest way is to add a where clause into your select statement.

On the below SQLite database, we are going to retrieve certain records from the name column, using our sql query.

The objective is that when we run the SQL code , it only returns the records we asked it to show us.

Writing the SQL code to return all rows where name = Ireland

Using the below code to extract our data

select * from error_dbase.country
where name = "Ireland"

it returns the following from the database

But what if we want to return more than one value?

Then you would just use an “in” statement, the SQL code would be changed as follows:

select * from error_dbase.country
where name in ("Ireland","Italy")

yielding a result as follows:

and this would also work if we had more than one of the same value as follows:

So to recap, we can choose one value on its own , using the “where” clause

or

we can use the “in” clause to ask the code to return records of multiple selections.

On our YouTube channel you can subscribe to find out more information about SQL, SQLite database and many more tips and tricks to help you with your Data Analytics.

To see a video tutorial explaining the concepts above, see below: