In recent posts on SQL here on this website, we have covered off a number of ways to extract your data from wildcards to filters.
Here we are going to build on that learning, but bring in a new dimension in the use of case statements.
These are very handy to use when you may want to apply specific data values to a piece of data.
We have several posts on machine learning; in effect, this could be used to add data needed to run the model.
Lets start looking at the SQLite database we have been using for other posts.
The table we will work off is shown below, we have used in other posts.
What we are going to look at, is the column name, and assign a binary number, 1 or 0 if the string in this column begins with the letter “A”.
Case statements work on the basis when a value meets certain criteria.
They normally are constructed as follows:
… criteria you want to meet
The power of this is that you are starting to add additional attributes about the string in a new column.
An example could be if a column value is the incorrect length, can be used as a data quality check. See my example below.
Below in the output when we apply a value of 1 to a new column if it starts with “A”.
Important to remember that the new column country_check only shows in the output.
On the above we have just one case statements, but what if we want more than one in the query?
If we want to apply multiple case statements to a query, just add a new when line, with your criteria.
The wild card values below will return any value between 1-5, they can be any combination of 1-5.
What if we want to have multiple case statements on different columns?
As can be seen below we have selected two columns to apply our case statement to.
We just have added a check on the column iso_code.
As before the new column will not be saved to the database.
The video to show how this is applied can be found below:
In conclusion we have shown examples of how to apply case statements in SQL, with the following pointers:
- Multiple case statements can be created.
- Wildcards can be used to return different variations of your data.
- Outputs are populated into a new column, though this is not saved to the table.
- You can test for multiple scenarios.
- Using case statements allows you to understand your data better with additional attributes.
We have lots of posts on this website that will help you build your data analytics skills.