SQL Joins
SQL joins can be complex. This page refreshes your knowledge by giving a few examples.
Let's assume the following two tables:
Employee:
id | first name | last name |
---|---|---|
1 | John | Smith |
2 | James | Williams |
3 | David | Taylor |
Sales:
employee | product | price |
---|---|---|
2 | Printer | 50.00 |
2 | Mouse | 10.00 |
3 | Laptop | 1500.00 |
NULL | Desk | 200.00 |
If you are interested in testing the queries below, here is the SQL to create these tables:
CREATE TABLE employee ( id int PRIMARY KEY, firstname char(63), lastname char(63) ); INSERT INTO employee VALUES ('1', 'John','Smith'); INSERT INTO employee VALUES ('2', 'James','Williams'); INSERT INTO employee VALUES ('3', 'David','Taylor'); CREATE TABLE sales ( employee int, product char(63), price float, KEY (employee) ); INSERT INTO sales VALUES ('2', 'Printer',50.00); INSERT INTO sales VALUES ('2', 'Mouse', 10.00); INSERT INTO sales VALUES ('3', 'Laptop', 1500.00); INSERT INTO sales VALUES (NULL, 'Desk', 200.00);
Contents
Join
The following three statements are equivalent:
SELECT firstname, lastname, product, price FROM employee, sales WHERE employee.id=sales.employee
SELECT firstname, lastname, product, price FROM employee JOIN sales ON employee.id=sales.employee
SELECT firstname, lastname, product, price FROM employee INNER JOIN sales ON employee.id=sales.employee
All result in:
First name | Last name | product | price |
---|---|---|---|
James | Williams | Printer | 50.00 |
James | Williams | Mouse | 10.00 |
David | Taylor | Laptop | 1500.00 |
Note that neither John Smith, nor the Desk is listed, since they do not have an equivalent in the other table. Also, note that James Williams occurs twice.
Left Join and Right Join
To get at least one listing of each employee, use a left join:
SELECT firstname, lastname, product, price FROM employee LEFT JOIN sales ON employee.id=sales.employee
This results in:
First name | Last name | product | price |
---|---|---|---|
John | Smith | NULL | NULL |
James | Williams | Printer | 50.00 |
James | Williams | Mouse | 10.00 |
David | Taylor | Laptop | 1500.00 |
To get at least one listing of each sale, use a right join:
SELECT firstname, lastname, product, price FROM employee RIGHT JOIN sales ON employee.id=sales.employee
This results in:
First name | Last name | product | price |
---|---|---|---|
James | Williams | Printer | 50.00 |
James | Williams | Mouse | 10.00 |
David | Taylor | Laptop | 1500.00 |
NULL | NULL | Desk | 200.00 |
Grouping
The left join above did list all employees, but some employees had duplicate records. That may not be what you want. You can solve this by a GROUP BY:
(warning: bad example!) SELECT firstname, lastname, product, price FROM employee LEFT JOIN sales ON employee.id=sales.employee GROUP BY employee.id
Which results in:
First name | Last name | product | price |
---|---|---|---|
John | Smith | NULL | NULL |
James | Williams | Printer | 50.00 |
David | Taylor | Laptop | 1500.00 |
However, a big warning is in place: There is no way to know which of the two sales of James Williams will appear in the result!
If you are only interested in the highest sale made, this is simply:
SELECT firstname, lastname, MAX(price) AS "highest sale" FROM employee LEFT JOIN sales ON employee.id=sales.employee GROUP BY employee.id
First name | Last name | highest sale |
---|---|---|
John | Smith | NULL |
James | Williams | 50.00 |
David | Taylor | 1500.00 |
Do not make the mistake to also select the product:
(warning: bad example!) SELECT firstname, lastname, product, MAX(price) AS "highest sale" FROM employee LEFT JOIN sales ON employee.id=sales.employee GROUP BY employee.id
As the product and price entry may not match (note that the mouse cost €10 and the printer €50, not the other way around):
First name | Last name | product | highest sale |
---|---|---|---|
John | Smith | NULL | NULL |
James | Williams | Mouse | 50.00 |
David | Taylor | Laptop | 1500.00 |
In fact, some databases like PostgreSQL won't even let you execute the above SQL statement, because the result is undefined.
To find the correct way to query the highest sale of each employee, we need to dive into queries that select the top n rows in a table.
Selecting Top Rows
The above problems boil down to the problem that you want to select a set of specific rows in the sales table. In this case, you want, for each employee, the record with the highest price. This problem is generally know as "selecting top n rows".
We can either use a query with HAVING, or a slightly less efficient query without a HAVING clause:
SELECT firstname, lastname, s1.product AS "best sale", s1.price AS "highest price" FROM employee LEFT JOIN sales s1 ON employee.id=s1.employee LEFT JOIN sales s2 ON s1.employee=s2.employee GROUP BY employee.id HAVING max(s2.price)=s1.price
with result:
First name | Last name | product | highest sale |
---|---|---|---|
James | Williams | Printer | 50.00 |
David | Taylor | Laptop | 1500.00 |
and:
SELECT firstname, lastname, s1.product AS "best sale", s1.price AS "highest price" FROM employee LEFT JOIN sales s1 ON employee.id=s1.employee LEFT JOIN sales s2 ON s1.employee=s2.employee AND s1.price < s2.price WHERE s2.price IS NULL
with result:
First name | Last name | best sale | highest price |
---|---|---|---|
John | Smith | NULL | NULL |
James | Williams | Printer | 50.00 |
David | Taylor | Laptop | 1500.00 |
Note that these are relatively complex queries, because you LEFT JOIN the sales table twice: once to select an entry, and once to "loop" through all rows in the sales table to make sure the current row has the highest value. The details are explained on the page SQL Top n Rows.
Subselections
Smart readers will have noticed that the top-n queries are very inefficient. For each employee with n sales, there are n*n records int the LEFT JOIN queries, and only 1 is filtered out. If you have a lot of records, this may become slow.
A better solution is to use subqueries, as they support the "ANY", "ALL" and "SOME" comparison.
In this case, what we really like is to LEFT JOIN with a WHERE clause determined by a subquery:
SELECT firstname, lastname, product AS "best sale", price AS "highest price" FROM employee LEFT JOIN sales ON price=(SELECT MAX(price) FROM sales WHERE employee.id=sales.employee)
which results in:
First name | Last name | best sale | highest price |
---|---|---|---|
John | Smith | NULL | NULL |
James | Williams | Printer | 50.00 |
David | Taylor | Laptop | 1500.00 |
This is without doubt the most elegant and probably fastest way to make this query. However, not all database servers may support subqueries.
Top-2 sales
Using the technique explained in SQL Top n Rows, we can select the top-n (here top 2) sales for each employee. We first select the two highest prices for that employee with the subquery SELECT price FROM sales WHERE employee.id=sales.employee ORDER BY price DESC LIMIT 2, and then use that result to select only rows whose price is higher than at least one of those values:
SELECT firstname, lastname, product AS "best sale", price AS "highest price" FROM employee LEFT JOIN sales ON employee.id=sales.employee AND price>= ANY (SELECT price FROM sales WHERE employee.id=sales.employee ORDER BY price DESC LIMIT 2)
results in:
First name | Last name | best sale | highest price |
---|---|---|---|
John | Smith | NULL | NULL |
James | Williams | Keyboard | 15.00 |
James | Williams | Printer | 50.00 |
David | Taylor | Laptop | 1500.00 |