How to Group By in a Select Statement

SQL group by is one of the most commonly used statements you will come across and it has many different applications.

It is a great way to aggregate data, when you are using one of the following in the select statement:

(A) Sum

(B)Count

(C) Average

(D) Max

(E) Min

There are more that can be added to this list , we just gave you some examples.

So lets look at some areas where it might be applied. We will use the below tables from our database to apply the group by to:

CUSTOMER

SALES

Group by using count

In the below code we are applying a very straight forward group by, and it is on the customer_type column.

To make the output easier to follow, we have also applied an alias to the count column. This makes it easier to see the output.

Also this column is not stored in the database, so you would need to apply additional logic to accomplish the storage of that data.

select count(customer_type) as "count of customer type",customer_type from dbo.CUSTOMER
group by customer_type;

Giving output:

If we remove the count from the above SQL, it simply will show the below output:

Group by using joins

Here we are joining tables, and then using a group by. It is quite similar to the first query, but it now brings in additional information, namely the invoice_no.

In addition now the data has better meaning. For example in the group by using count above , the row 6 had six values for WEB.

Now that line is broken out further , as one of the WEB records has an invoice value, it provides a better understanding of the data.

select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
--where b.invoice_no is null
group by a.customer_type,b.invoice_no

Giving output:

Group by including case statements

In the next scenario, we will see how to use case statements in SQL.

The inclusion will allow additional information to be included in the output.

As can be seen the group by is not impacted by this, but the good thing is that if you are building a machine learning model, this will include great additional information.

select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no,
case
when a.customer_type = 'EMAIL' then 1 
when a.customer_type = 'POSTAL' then 2
when a.customer_type = 'WEB' then 3
ELSE 0
end as Cust_Type_No
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
group by a.customer_type,b.invoice_no

Which will give:

Group by using a sub query

The final approach that we will look at is where you are using a sub query.

The purpose of a subquery is to get the information you want, and then you can pick data out of it as you see fit.

In the below SQL we can see that aggregates further the count of the fields. If we ran the subquery on its own, then we would get:

But running the below now aggregates up further, it essentially is counting the no of occurences of each value in the “Cust_Type_No” column.

Then it adds beside for each, the values that are associated with it from the customer_type column.

select count(tmp.Cust_Type_No) as "Count_cust_type", tmp.customer_type from (
select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no,
case
when a.customer_type = 'EMAIL' then 1 
when a.customer_type = 'POSTAL' then 2
when a.customer_type = 'WEB' then 3
ELSE 0
end as Cust_Type_No
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
group by a.customer_type,b.invoice_no) as tmp
group by tmp.Cust_Type_No,tmp.customer_type

The output which is:

So in summary here are some points to look out for:

(A) Use group by with the right aggregate function, so you get the right output.

(B) You can add in additional logic without impacting the group by, examples here include case statements.

(C) Group by can be added to sub queries as well as the main query you are extracting data from.

See other posts that may interest you!

how to write subqueries in sql

how to update records in sql

how to insert data into a table in SQL

how to join tables in SQL

how to use case statements in SQL

In recent posts on SQL here on this website, we have covered off a number of ways to extract your data from wildcards to filters.

Here we are going to build on that learning, but bring in a new dimension in the use of case statements.

These are very handy to use when you may want to apply specific data values to a piece of data.

We have several posts on machine learning; in effect, this could be used to add data needed to run the model.

Lets start looking at the SQLite database we have been using for other posts.

The table we will work off is shown below, we have used in other posts.

What we are going to look at, is the column name, and assign a binary number, 1 or 0 if the string in this column begins with the letter “A”.

Case statements work on the basis when a value meets certain criteria.

They normally are constructed as follows:

CASE

… criteria you want to meet

End

The power of this is that you are starting to add additional attributes about the string in a new column.

An example could be if a column value is the incorrect length, can be used as a data quality check. See my example below.

Below in the output when we apply a value of 1 to a new column if it starts with “A”.

Important to remember that the new column country_check only shows in the output.

On the above we have just one case statements, but what if we want more than one in the query?

If we want to apply multiple case statements to a query, just add a new when line, with your criteria.

The wild card values below will return any value between 1-5, they can be any combination of 1-5.

What if we want to have multiple case statements on different columns?

As can be seen below we have selected two columns to apply our case statement to.

We just have added a check on the column iso_code.

As before the new column will not be saved to the database.

The video to show how this is applied can be found below:

In conclusion we have shown examples of how to apply case statements in SQL, with the following pointers:

  • Multiple case statements can be created.
  • Wildcards can be used to return different variations of your data.
  • Outputs are populated into a new column, though this is not saved to the table.
  • You can test for multiple scenarios.
  • Using case statements allows you to understand your data better with additional attributes.

We have lots of posts on this website that will help you build your data analytics skills.