When working with databases in your data analytics projects, most likely you will deal with more than one table. As a result in certain scenarios there will be a link between the tables
This will ensure that you can use data from the two or more tables that can be joined together. But how would you do that?
In order to join two tables, you will need columns in both tables that have values that are unique and of similar values.
In other words you need to create what is called a primary key, its characteristics are as follows:
- Its values are unique.
- They cannot be null.
So where does a foreign key come in? Say we have two tables Table A and Table B.
Table “Customer Table” will have a column that has a primary key call Customer_No
Table “Sales Table” also has a column with a primary key called Sales_No.
- Now we cant join the two tables, because the primary keys are different, and will have different values
- But in the Sales Table, we will have an additional column called Customer_No, and it will hold some of the unique values that are held in the Customer Table, Customer_No column.
- This Customer_No in the Sales Table is called the foreign key. It is unique, has null values and can be connected to a primary key column Customer_No on the Customer Table.
In this way tables can be connected where values in columns are not always the primary key of a table.
So let’s look at tables in SQLite and see how this works in practice.
Below you have a Customer Table and the Sales Table
In both tables, we have a primary key, though they are not the same primary key. As a result, you would not be able to join the tables as the primary keys contained in them do not have the same values, and this is how you would normally join.
In saying that, the two tables can be joined as the foreign key of Sales is related to the primary key of Customer. If invoice_no was not on Sales , then you could make the customer_no in sales the primary key.
So lets look at the tables below with data in them.
Normally when you try to join two tables, it is on the primary key. On the below invoice_no is the primary key of the table Sales, and customer_no is the primary key of the customer table.
While their values in each column are unique, they are not the same , so a join would fail as per the below:
But if you change on the join b.invoice_no TO b.customer_no, it will now bring back all the values asked for. The reason?
The primary key on one is linked to the foreign key on the other, with similar unique values.
As a result, you can still have a primary key column, not in another table and join to that table as long as there is a foreign key column with the same values.
This helps to maintain the table structure, without having to remove primary keys.