SQL Top n Rows

Let's assume the following table:

Sales:

If you are interested in testing the queries below, here is the SQL to create these tables:

CREATE TABLE sales (  employee int,   product char(63),   price float,   KEY (employee) ); INSERT INTO sales VALUES ('2', 'Keyboard', 15.00); INSERT INTO sales VALUES ('2', 'Printer',50.00); INSERT INTO sales VALUES ('2', 'Mouse',  10.00); INSERT INTO sales VALUES ('3', 'Laptop', 1500.00); INSERT INTO sales VALUES (NULL, 'Desk',  200.00);

Select Overall Top Rows
If you want to select the overal top n rows from a table, you can simply order the results and use LIMIT.

The top-2 sales:

(for MySQL and PostgreSQL:) SELECT employee, product, price FROM sales ORDER BY price DESC LIMIT 2 (for MS SQL:) SELECT TOP 2 employee, product, price FROM sales ORDER BY price DESC (for Oracle:) SELECT employee, product, price FROM sales ORDER BY price WHERE ROWNUM <= 2

gives:

As you can see, the query differs per SQL implementation. I will use the MySQL/PostgreSQL syntax in the rest of this article.

Selecting a Single Top Row
Let's assume that you want to select the best sale for each employee id, rather than the overall best sales. This requires more complex queries. Let's first examine the case where you only want a single best sales for each employee (rather than the 2 or 3 best sales).

There are a few options here, each with different performance characteristics and requirements for your SQL database.

Group By Having construct
SELECT s1.employee, s1.product, s1.price FROM sales s1   LEFT JOIN sales s2 ON s1.employee=s2.employee GROUP BY s1.employee,s1.product,s1.price HAVING max(s2.price)=s1.price

gives:

Double Table Comparison Construct
SELECT s1.employee, s1.product, s1.price FROM sales s1   LEFT JOIN sales s2 ON s1.employee=s2.employee AND s1.price < s2.price WHERE s2.price IS NULL

which gives:

You may wonder why both queries have a LEFT JOIN with themselves, even though we don't SELECT any field from the joined table s2. Good question. What we basically do is, for each row in the sales table (s1), compare to all other rows in the sales table (s2) with the same employee id. Like this:

SELECT s1.employee, s1.product, s1.price, s2.employee, s2.product, s2.price FROM sales s1 LEFT JOIN sales s2 ON s1.employee=s2.employee

Since employee 2 made three sales, there are 3*3 = 9 rows for him. Of these 9 rows, we select the one with the highest value. For the first query, we GROUP BY a single row (GROUP BY all properties of a row; it would have been more efficient if we had a unique key for the sales table!), and select the value HAVING the maximum price for all entries in the group. The second query is an alternative for those database servers that do not understand the HAVING clause. There, we simply select all rows from s2 whose prices is higher then s1.price. We find one or more for each row, except if there is no higher price: then s2.price will be NULL. That is exactly the value we want (we want the highest price), so we select only those.

Subquery Construct
SELECT s1.employee, s1.product, s1.price FROM sales s1 WHERE price= (SELECT MAX(price) FROM sales s2 WHERE s1.employee=s2.employee)

gives:

Top-2 rows
An alternative to the subquery SELECT MAX(price) FROM sales s2 WHERE s1.employee=s2.employee is: SELECT price FROM sales s2 WHERE s1.employee=s2.employee ORDER BY price DESC LIMIT 1.

The advantage of the later query is that we can decide not to use only the top-1 (LIMIT 1), but also the top 2 (LIMIT 2) rows. This gives the query:

SELECT s1.employee, s1.product, s1.price FROM sales s1 WHERE price >= ANY (SELECT price FROM sales s2 WHERE s1.employee=s2.employee ORDER BY price DESC LIMIT 2)

gives:

The above query does not work for all SQL databases (MySQL 5 can not deal with ANY and LIMIT combined in a subquery). However, there is another trick. We can add a "rank" for each sale, using a subquery in the SELECT statement rather than in the WHERE statement:

SELECT employee, product, price, (SELECT COUNT(*) FROM sales s2 WHERE s2.price>=s1.price AND s2.employee=s1.employee) AS rank FROM sales s1

which results in:

With a WHERE clause, we can only select those rows whose rank <= 2. However, since rank is not defined in the scope of the WHERE clause, we must write:

SELECT employee, product, price, (SELECT COUNT(*) FROM sales s2 WHERE s2.price>=s1.price AND s2.employee=s1.employee) AS rank FROM sales s1  WHERE s2.price>=s1.price AND s2.employee=s1.employee) <= 2

However, if we know that the following to statements are equivalent: SELECT employee, product, price FROM sales s1 and SELECT * FROM (SELECT employee, product, price FROM sales s1) AS x

We can use the shorter SQL query:

SELECT * FROM (SELECT employee, product, price,   (SELECT COUNT(*) FROM sales s2 WHERE s2.price>=s1.price AND s2.employee=s1.employee) AS rank    FROM sales s1) as x   WHERE rank <= 2

which results in:

Joining with Other Tables
For examples how to join this with other tables, see SQL Joins