How to create a calculated field in Tableau

Estimated reading time: 6 minutes

You are working on a data visualisation project, but in some instances the data may not be in a format that you want in the output. This is is where a calculated field comes in, it allows you define the output in a way you would like to see it.

So what are calculated fields exactly?

Tableau calculated fields can perform a number of different functions:

  • They enable aggregation.
  • You can use them to apply filters to your data.
  • If you have a need for ratios, they can be used for that.
  • Segmenting – return your data in specific segments.

In essence you can define how the output looks by using SQL or logical functions such as if, case, is null etc.

The purpose is that you can control and or define what is output other than what Tableau gives you , as a result the ability to include more detailed analysis based on your understanding of the underlying data is facilitated.

What are the different types of calculated fields?

There are three different types of fields as follows:

1.Basic expressions

Basic expressions allow you on a row basis, to get the data you want out of it. That could be for example finding a specific piece of data or finding the count of the existence of some data you want to analyse.

Here in our raw data , the source data contains 100K records, we have created a calculated field to count this:

Also we could just count the no of ‘Web’ occurences as follows, and assign a value of 1 to them. This utlisies an “IF STATEMENT”

2. Level of detail ( LOD) expressions

With this you can write expressions that give you more control and allow you to define the granularity you need to be returned.

In the below we have started out by year, but in the output it allows the data to be drilled down into:

As can be seen this code summaries up to the year, but in the bar, if you click the + sign beside “YEAR” then it starts to drill down into its lower level data . So the next level will be Quarter:

Then Month:

And finally day:

3.Table Calculations

In table calculations, Tableau allows you to create a new column whose output is a calculated value, usually completed by comparing one or more columns.

Some examples you may come across include:

  1. Month to month or qtr to qtr comparisons.
  2. Sales comparisons.
  3. No of new customers comparisons.

In Tableau there is some handy functionality built in which allows quick calculated fields to be shown on the screen.

As can be seen if you right mouse click on the shelf , of the data you are analysing, it will allow you to choose what type of table calculation to appear.

In the below we have chosen gender as the dimension to measure, and its percentage split. There is an equal split between male and female.

We could also, split it out by rank.

If you where looking for more functionality , then the following screens could also be used.

You simply Right mouse click on Sum(Qty) in the marks pane, and then edit table calculation.

with options:

and

So in summary there are a lot of different ways to create calculated fields, some through the functionality that Tableau has provided, others through writing your own logic.

Note and I have not shown it here, but if you are connected to a database, you can also connect using a custom SQL query. This functionality would only be available in the professional version.

How to add a date when a record is created

Estimated reading time: 2 minutes

You maybe working with a lot of data and inserting it into tables, and want to create a date stamp so you can see when the record was created.

Below we will take you through the steps of how to complete this in SQL server management studio.

In this table we have created a new column called “date_created”

We have altered this column to allow automatic date stamp creation as follows:

alter table dbo.CUSTOMER 
add constraint df_TABLE_DATE default getdate() for date_created

What the above code does is basically tell the database any time a new record is created, it should automatically add in a date to the column.

Note that the existing values will not change as I applied this update after the column creation.

Adding in a new record and checking that date stamp has been created

Using the below code:

Insert INTO dbo.CUSTOMER(CUSTOMER_NO,customer_type)
VALUES (777777,'web')

Yields the following result:

As can be seen now for the new record, the date has been added.

It is probably good, that when designing the database table, that this is factored in from the very start , otherwise you will have to go back and change lots of records, which will be very time consuming.

Import a CSV file with an SQL query

Estimated reading time: 3 minutes

In many of the SQL programming softwatre tools, there will most likely be an option for you to import your data torugh a screen they have provided.

In SQL Server Management Studio the below screen is available to pick your file to import, and it quickly guides you through the steps.

But what if you wanted to do this through code, as you may have lots of different files to be loaded at different times?

Below we will take you through the steps.

Check the file you want to import

Below we have created a CSV file, with the relevant headers and data. It is important to have the headers correct as this is what we will use in a step further down.

Create the code that will import the data

drop table sales.dbo.test_import_csv; ===> drops the existing table, to allow a refreshed copy be made.

create table sales.dbo.test_import_csv(
customer_name varchar(10),
age int,
place_of_birth varchar(10)); ===> These steps allow you to recreate the table with the column names and data type.



insert into sales.dbo.test_import_csv
select * FROM
OPENROWSET(BULK 'C:\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\SQL\how_to_import_a_csv _file_in_sql\csv_file_import.csv',
formatfile = 'C:\Users\haugh\OneDrive\dataanalyticsireland\YOUTUBE\SQL\how_to_import_a_csv _file_in_sql\csv_file_import_set.txt',
FIRSTROW=2,
FORMAT='CSV'
) as tmp ===> These lines open the file, read in the data to the table created.

There are two important parts to the above code:

(A) OPENROWSET – This enables the connection to the CSV, read it and then insert the information into the database table.

(B) Formatfile – This is an important part of the code. Its purpose is to tell openrowset how many columns are in the CSV file. Its contents are below:

As can bee seen it outlines each column within the file and its name, and the exact column name in the header.

Also the last line indicates that this is the last column and the program should only read as far as this.

Running the code and the output

When the code is run, the following is the output:

Eleven rows where inserted, these are the now of rows in the CSV, excluding the headers.

The database table was created:

And finally, the table was populated with the data:

So in summary we have demonstrated that the ability to easily import a CSV file, with the above steps.

In essence you could incorporate this into an automated process.

how to write subqueries in SQL

In SQL a lot of the times there will be a need to write subqueries that will extract data that you can analyse.

But what if you want to use the data as input to another query that can then be used to give you the result you desire?

This is is the power of using a subquery, the subquery can gather the information you need, the main query will then provide the output.

But why would somebody approach it this way?

(A) Complex SQL and joins are used in the sub query.

(B) The main query that uses the sub query , summaries the data.

(C) Lots of data is returned in the subquery, the programmer chooses out of it what they want.

(D) Flexibility – You can adjust the main query to pull out what you need from it, not everything is required to be used, but can be quickly if a decision is made to utilise that data.

(E) As long as the subquery does not need to be changed at all, it can reduce maintenance and coding required to keep it up to date.

So what are the different ways we can use sub queries?

We will use these table values to return output that will allow us to provide output for analysis.

CUSTOMER

SALES

Selecting specific values that you only want to look at

In the below code the second select statement is the subquery. In this instance, we are using the output of that as an input to the first query on the left-hand side of the equals sign.

This therefore allows us to alter the right hand side for the values we want to return in the output, by just adding customer nos.

Note by adding customer nos we will need to change the equals sign, that will be covered off in the next section

select * FROM DBO.SALES
where CUSTOMER_NO = (select CUSTOMER_NO FROM dbo.CUSTOMER WHERE CUSTOMER_NO = '123456' )

Giving the below output:

Returning more than one value from the sub query

In the above section, we where focused on returning one value. That used the operator “equals”.

When we have more than one value to return that will not work, hence we use the like operator in the sub query.

As a result of this, the sub query will now return more than one value, this is the desired output we are looking for

select * FROM DBO.SALES
where CUSTOMER_NO IN (select CUSTOMER_NO FROM dbo.CUSTOMER WHERE INVOICE_NO LIKE 'INV%')

The output now that will be produced is as follows:

As can be seen the subquery returns through the wild card ‘INV%’ all those rows with that value. Then the main query will match all the customers it has with the customer nos returned in the subquery to produce the final output.

Use a subquery to UPDATE values

There may be a scenario where a subquery is used to update values on a table.

The subquery below checks for a NULL INVOICE_NO and the customer no = 654654, and if both are true, then the main query updates the table column INVOICE_NO value to ‘ERROR’ where CUSTOMER_NO = 654654.

SELECT * FROM dbo.SALES;
UPDATE dbo.Sales
SET INVOICE_NO = 'ERROR'
WHERE CUSTOMER_NO in (SELECT CUSTOMER_NO FROM dbo.SALES where INVOICE_NO is null AND CUSTOMER_NO = 654654);
SELECT * FROM dbo.SALES;

Before the update is applied:

After the update is applied:

Use a subquery to DELETE values

The final use of a subquery we will use is to look at a scenario where we want to delete values from the database table.

If we look at the above output from the previous section, we will see that the below code should remove rows 5-10 and 12.

Here we will use:

DELETE FROM dbo.Sales
WHERE CUSTOMER_NO in (SELECT CUSTOMER_NO FROM dbo.SALES where INVOICE_NO is null);
SELECT * FROM dbo.SALES;



Have you seen these great posts?

how to groupby in a select statement
how to update records in sql
Deleting table records in SQL

How to Group By in a Select Statement

SQL group by is one of the most commonly used statements you will come across and it has many different applications.

It is a great way to aggregate data, when you are using one of the following in the select statement:

(A) Sum

(B)Count

(C) Average

(D) Max

(E) Min

There are more that can be added to this list , we just gave you some examples.

So lets look at some areas where it might be applied. We will use the below tables from our database to apply the group by to:

CUSTOMER

SALES

Group by using count

In the below code we are applying a very straight forward group by, and it is on the customer_type column.

To make the output easier to follow, we have also applied an alias to the count column. This makes it easier to see the output.

Also this column is not stored in the database, so you would need to apply additional logic to accomplish the storage of that data.

select count(customer_type) as "count of customer type",customer_type from dbo.CUSTOMER
group by customer_type;

Giving output:

If we remove the count from the above SQL, it simply will show the below output:

Group by using joins

Here we are joining tables, and then using a group by. It is quite similar to the first query, but it now brings in additional information, namely the invoice_no.

In addition now the data has better meaning. For example in the group by using count above , the row 6 had six values for WEB.

Now that line is broken out further , as one of the WEB records has an invoice value, it provides a better understanding of the data.

select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
--where b.invoice_no is null
group by a.customer_type,b.invoice_no

Giving output:

Group by including case statements

In the next scenario, we will see how to use case statements in SQL.

The inclusion will allow additional information to be included in the output.

As can be seen the group by is not impacted by this, but the good thing is that if you are building a machine learning model, this will include great additional information.

select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no,
case
when a.customer_type = 'EMAIL' then 1 
when a.customer_type = 'POSTAL' then 2
when a.customer_type = 'WEB' then 3
ELSE 0
end as Cust_Type_No
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
group by a.customer_type,b.invoice_no

Which will give:

Group by using a sub query

The final approach that we will look at is where you are using a sub query.

The purpose of a subquery is to get the information you want, and then you can pick data out of it as you see fit.

In the below SQL we can see that aggregates further the count of the fields. If we ran the subquery on its own, then we would get:

But running the below now aggregates up further, it essentially is counting the no of occurences of each value in the “Cust_Type_No” column.

Then it adds beside for each, the values that are associated with it from the customer_type column.

select count(tmp.Cust_Type_No) as "Count_cust_type", tmp.customer_type from (
select count(a.customer_type) as "count of customer type",a.customer_type,b.invoice_no,
case
when a.customer_type = 'EMAIL' then 1 
when a.customer_type = 'POSTAL' then 2
when a.customer_type = 'WEB' then 3
ELSE 0
end as Cust_Type_No
from dbo.CUSTOMER a --left table
left join dbo.SALES b --right table
on a.customer_no = b.customer_no
group by a.customer_type,b.invoice_no) as tmp
group by tmp.Cust_Type_No,tmp.customer_type

The output which is:

So in summary here are some points to look out for:

(A) Use group by with the right aggregate function, so you get the right output.

(B) You can add in additional logic without impacting the group by, examples here include case statements.

(C) Group by can be added to sub queries as well as the main query you are extracting data from.

See other posts that may interest you!

how to write subqueries in sql

how to update records in sql

how to insert data into a table in SQL

how to join tables in SQL

Deleting table records with SQL

Estimated reading time: 4 minutes

Often you will hear about deleting table records with SQL, but how is this achieved, if you have not done it before?

In previous videos on SQL we discussed how to update records in SQL , how to insert data into a table in SQL.

You could be faced with the scenario where there is data that you no longer need, as a result, it needs to be deleted.

So lets work through three different scenarios as follows:

(A) Delete a record based on a unique identifier, in this instance CUSTOMER_NO.

(B)Delete multiple customer records at once using CUSTOMER_NO as the column that will identify the records.

(C) Delete a record where it equals a particular column value.

Our initial dataset will look like this:

This image has an empty alt attribute; its file name is image-14.png

Delete a record based on a unique identifier

DELETE FROM dbo.CUSTOMER WHERE CUSTOMER_NO = 987981

select * from dbo.CUSTOMER

When the above code is run it gives the following output.

As can be seen it now has ten records and the row for CUSTOMER_NO = 987981 has been removed.

This image has an empty alt attribute; its file name is image-15.png

Delete multiple customer records at once

In the above scenario we have deleted just one record, but most likely you will want to remove more than one.

This code below will complete this at once. Remember the column we are using is a primary key column, hence all the values are unique.

It may be the case that you are doing to have to do this on a column that does not have unique values, the code will still work.

I would caution though that using the primary key column value always allows you to only remove the ones that are not needed.

It could be the case that there are genuine values that need to remain, even if they are duplicated.

DELETE FROM dbo.CUSTOMER WHERE CUSTOMER_NO in(3,25897458,25898633)

select * from dbo.CUSTOMER
This image has an empty alt attribute; its file name is image-16.png

As can be seen running the above code now leaves us with seven rows, down from ten and the three rows we had in the SQL have been removed.

Deleting a record where it equals a particular column value

So our final scenario is where we need to find a particular value in a column , and it is exists , then delete the whole record for that row.

Here we are not relying on the primary key, but basically a string search in the customer_type column.

When this SQL is run, it supposed remove two rows and leave us with five records.

The problem that you will encounter

There will be a problem with this as the primary key , even though is not been referenced will throw an error.

The error in this instance relates to the the CUSTOMER_NO on the table customer

  • It has a foreign key equivalent column in the sales table.

In essence you cant leave out the primary key value when completing this SQL statement.

As a result both tables need to be updated, where the primary key value exists.

DELETE FROM dbo.CUSTOMER WHERE customer_type = 'POSTAL'

select * from dbo.CUSTOMER
** the above code will fail with the below error:
The DELETE statement conflicted with the REFERENCE constraint "FK__SALES__CUSTOMER___267ABA7A". The conflict occurred in database "SALES", table "dbo.SALES", column 'CUSTOMER_NO'.
The statement has been terminated.

Our table in sales looks like this:

This image has an empty alt attribute; its file name is image-17.png

Updating our logic as follows:

DELETE FROM dbo.SALES
where SALES.CUSTOMER_NO = 123458 ---> This line needs to be run first as the next line checks if this exists as a foreign key.
DELETE FROM dbo.CUSTOMER
where CUSTOMER.CUSTOMER_NO = 123458
and customer_type = 'POSTAL'


select * from dbo.CUSTOMER
select * from dbo.SALES

Results in:

This image has an empty alt attribute; its file name is image-18.png

To summarise there are a number of things to consider:

  • It is important to always check if there is a primary and foreign key relationship.
  • Specifically not checking these relations will result in the SQL outputting an error.
  • Under these circumstances a review of all tables before proceeding is advised.
  • Ensure you have your tables backed up before doing any deletions!

how to update records in SQL

Estimated reading time: 3 minutes

There are multiple reasons for wanting to update records in a database table, from data quality to enriching your data for any machine learning projects.

One example maybe where you want to add a creation date to a record.

In this blog post we will look at how to update database records, and also where you try to apply updates and they cannot be completed.

The where clause will become important as it allows you to identify specific records.

It will be updating the records below:

Updating a single record

--Update one record on its own
UPDATE dbo.CUSTOMER
SET customer_type = 'WEB'
WHERE CUSTOMER_NO = 125872;

Running the above code gives this output, resulting in the value been updated per our SQL statement.

Updating multiple rows at once

There maybe scenarios where you want to have multiple records/rows updated at once, the below code will provide you with an example of how this can be achevied:

--Update multiple rows at once
UPDATE dbo.CUSTOMER
SET customer_type = 'WEB'
WHERE CUSTOMER_NO in(333333,456789)

This will now give you updated values in the output below. It is a very handy way to at once get multiple records updated.

Show an update that is valid but will not change anything

--Show an update request that does not complete
UPDATE dbo.CUSTOMER
SET CUSTOMER_NO = 2,customer_type = 'ERROR'
WHERE CUSTOMER_NO = 456789222 ---> will not execute as this value does not exist, it ignores.

With the above code, we have a valid update statement. In this scenario it will not complete as the CUSTOMER_NO does not exist on the database table.

In essence the output is exactly the same as in the previous example.

Update multiple column values

In our final example here, we are going to show how to update the CUSTOMER_No and the customer_type values.

The below SQL will change both values, but it is important to note that as the CUSTOMER_NO is a primary key, it cannot be changed to a number that already exists as primary keys can’t be null or have duplicates.

UPDATE dbo.CUSTOMER
SET CUSTOMER_NO = 3,customer_type = 'ERROR'
WHERE CUSTOMER_NO = 333333

So to wrap up there are a number of ways to update your database table. Here we have introduced some examples. There are also some additional steps you could take:

  1. Count all the records before and after to ensure they remain the same.
  2. Create a copy of the table before updates are applied, and then compare that to the updated table. The differences should be what you expect to see.

how to write subqueries in SQL

how to groupby in a select statement

how to insert data into a table in SQL

how to join tables in SQL

how to insert data into a table in SQL

Estimated reading time: 4 minutes

After building your data using a create statement, one of the next steps using SQL is to now get your data in, so it can be managed and updated where necessary.

There are many ways to import your data including , one you could use is import a CSV file with an SQL query.

Here we talk about some of the steps involved, and also how an understanding of the primary key is important.

Before all the following updates happen, the table looks like this:

--Inserting one row of data
Insert INTO dbo.CUSTOMER(CUSTOMER_NO,customer_type)
VALUES (456789,'web')
select * from dbo.CUSTOMER

Insert one row into a database table

In the below code we will use the INSERT statement that basically tells the SQL to add a new row to the table.

It is important to remember two things about the primary key:

  • It has to be unique.
  • There can be no null values.

As a result of this been enforced, it will allow tables to be joined together assuming they have the same primary key information.

The output gives us the following:

Inserting multiple rows of data into one table

Following on from the above , we may have a need to insert more than one row.

Again it would follow the same steps, except, we need to ensure all data been inserted follows the same structure and there are no duplicate values in the CUSTOMER_NO column.

Any duplicates in this column would cause it to fail as follows:

It would fail on 125872 and then 456789, both need to be removed from the insert. A sample error message is shown below.

Insert INTO dbo.CUSTOMER(CUSTOMER_NO,customer_type)
VALUES 
 (125872,'web'),
 (333333,'web'),
 (987981,'web'),
 (879784,'web'),
 (654654,'web'),
 (456789, 'Phone')
 select * from dbo.CUSTOMER

Removing the duplicate values from the insert statement, also allows the update to complete successfully:

Insert INTO dbo.CUSTOMER(CUSTOMER_NO,customer_type)
VALUES 
 (333333,'web'),
 (987981,'web'),
 (879784,'web'),
 (654654,'web')
 select * from dbo.CUSTOMER

Insert values into a particular column

There is going to be a scenario, from time to time where you only want to insert values into a particular column, but not all of them.

With this table, as it has a primary key, you can insert new values into CUSTOMER_NO, with no values required for customer_type, as shown below.

If there was no primary key, then you could use the below statement, and in the where clause identify those specific records that need updating.

--Insert values into a particular column
Insert INTO dbo.CUSTOMER(CUSTOMER_NO)
VALUES 
(25897458),
(25898633)
select * from dbo.CUSTOMER

This will yield in the output:

The next section will show you the error , when trying to insert into the customer_type, and leaving CUSTOMER_NO blank.

Inserting a value in without updating the primary key, will fail

Finally we will look at how trying to update a table with a primary key, without providing the primary key value will give you an error.

The code we will use is:

--Inserting a value in without updating the primary key, will fail
Insert INTO dbo.CUSTOMER(customer_type)
VALUES 
('phone'),
('web')
select * from dbo.CUSTOMER

But the table schema states that the CUSTOMER_NO cannot be null:

As a result the following error appears:

In summary we have taken you through a number of different steps , and some of the caveats that you need to understand before inserting data.

The most important thing is that if you have a primary key on a table, doing inserts needs a value supplied that is not a duplicate and also that no empty values for that column are supplied.

how to join tables in SQL

Estimated reading time: 4 minutes

When running an sql select query, most likely we may have a need to bring in data from other tables.

The reason for this , is that tables have a specific purpose for the data they store.

One of the most important features in a data architecture, is to avoid duplication.

This has the impact of only storing data on a table that is required.

As a result joining tables is necessary if you want to get all the information you need .

As an illustration on a customer purchase , the purchase details would be on the purchase table.

BUT the customer details would not be stored there. They would be stored on the customer table.

To put it another way a customer may make many purchases, but you only need their name and address etc. stored once.

The object therefore of joining tables is to use this structure , and avoid duplication over many tables .

For example in the above table, if you had the customer name and address on the purchases table and customer table,

they would always have to be in sync, and it would bring up a maintenance headache for the database administrator.

So what was developed was primary and foreign keys, which helps join tables, based on unique similar values in a column in each table.

What are the types of joins that can be used?

Inner Join: An inner join is where you join two tables but only return the rows that match in both tables.

Left Join: A left join is where you return all the records from the left table, and any matched records from the right table.

Right Join: A right join is where you return all the records from the right table, and any matched records from the left table.

Full Join: A full join uses the left join and right join, and returns all the rows from both tables in the output.

Equally important, where there is no value found a NULL value will be returned.

Lets take some examples to explain the above concepts.

So we have two tables below:

Sales Database, Customer Table

Sales Database, Sales Table

Inner Join

If we run this code, we will return the below. For this reason the purpose of an Inner Join is to only return matched records common to both.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a -- left table
inner join sales.dbo.sales b -- right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Left Join

With a left join of the below code will return four rows. All the records from the left table and any matched records from the right table.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a --left table
left join sales.dbo.sales b --right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Right Join

With a right join of the below code will return three rows. All the records from the right table and any matched records from the left table.

select a.customer_no, a.customer_type, b.INVOICE_NO 
from dbo.CUSTOMER a --left table
right join sales.dbo.sales b -- right table
on a.CUSTOMER_NO = b.CUSTOMER_NO

Full Join

Finally in order to run a full join, the result of the below code would be as per the below. The objective is to return all rows matching and unmatching.

In the output, there is a Null Value, meaning that that row has no value for INVOICE_NO which is a primary key.

In essence primary keys are not allowed to have null values.

select a.customer_no, a.customer_type, b.INVOICE_NO from dbo.sales b
full join sales.dbo.customer a
on a.CUSTOMER_NO = b.CUSTOMER_NO

What is a Primary Key and Foreign Key

When working with databases in your data analytics projects, most likely you will deal with more than one table. As a result in certain scenarios there will be a link between the tables

This will ensure that you can use data from the two or more tables that can be joined together. But how would you do that?

In order to join two tables, you will need columns in both tables that have values that are unique and of similar values.

In other words you need to create what is called a primary key, its characteristics are as follows:

  • Its values are unique.
  • They cannot be null.

So where does a foreign key come in? Say we have two tables Table A and Table B.

Table “Customer Table” will have a column that has a primary key call Customer_No

Table “Sales Table” also has a column with a primary key called Sales_No.

  • Now we cant join the two tables, because the primary keys are different, and will have different values
  • But in the Sales Table, we will have an additional column called Customer_No, and it will hold some of the unique values that are held in the Customer Table, Customer_No column.
  • This Customer_No in the Sales Table is called the foreign key. It is unique, has null values and can be connected to a primary key column Customer_No on the Customer Table.

In this way tables can be connected where values in columns are not always the primary key of a table.

So let’s look at tables in SQLite and see how this works in practice.

Below you have a Customer Table and the Sales Table

In both tables, we have a primary key, though they are not the same primary key. As a result, you would not be able to join the tables as the primary keys contained in them do not have the same values, and this is how you would normally join.

In saying that, the two tables can be joined as the foreign key of Sales is related to the primary key of Customer. If invoice_no was not on Sales , then you could make the customer_no in sales the primary key.

So lets look at the tables below with data in them.

Normally when you try to join two tables, it is on the primary key. On the below invoice_no is the primary key of the table Sales, and customer_no is the primary key of the customer table.

While their values in each column are unique, they are not the same , so a join would fail as per the below:

But if you change on the join b.invoice_no TO b.customer_no, it will now bring back all the values asked for. The reason?

The primary key on one is linked to the foreign key on the other, with similar unique values.

As a result, you can still have a primary key column, not in another table and join to that table as long as there is a foreign key column with the same values.

This helps to maintain the table structure, without having to remove primary keys.