Saturday, November 27, 2010

SQL: outer joins

Last time we covered INNER JOINs including a couple of special cases CROSS JOIN and self joins. In INNER JOINS the result set returns only the set of rows that match the join criteria (if any). IN OUTER JOINS, the results might contain both matched and unmatched rows. It is for this reason that beginners might find such JOINS a little more confusing. However, the logic is really quite straightforward.

There are three types of outer joins: LEFT, RIGHT, and FULL. Let's start with a LEFT OUTER JOIN.

In a left join when joining table t1 with table t2 PostgreSQL first does a "normal" inner join. It then looks to see if there are any rows from t1 that are not in the result set. If so, it adds in those rows placing NULLs for all the fields of t2. Thus, this guarantees that each row of t1 will appear at least once in the results.

Here are our data:

joindb=# select * from clients;
 id | name 
----+-------
  0 | joe
  1 | bob
  2 | alice
(3 rows)

joindb=# select * from orders;
 id | value
----+-------
  1 | val1
  2 | val2
  3 | val3
(3 rows)

An inner join on ids matches on id=1 and id=2

joindb=# select * from clients join orders on clients.id=orders.id;
 id | name  | id | value
----+-------+----+-------
  1 | bob   |  1 | val1
  2 | alice |  2 | val2
(2 rows)

but a left outer join has that inner join result set plus the row for id=0 from clients (shown in red):

joindb=# select * from clients left join orders on clients.id=orders.id;
 id | name  | id | value
----+-------+----+-------
  0 | joe   |    |
  1 | bob   |  1 | val1
  2 | alice |  2 | val2
(3 rows)

Left joins are implicitly OUTER JOINs but we can add that keyword in explicitly

select * from clients left outer join orders on clients.id=orders.id;

We can obtain the same result set if we use the USING syntax: select * from clients left join orders using(id);

That's it. Simple isn't it?

Why might we want something like this? Suppose we have a list of clients and we have purchase data from the last month. We might want a report of the sales for all our clients, even those that didn't buy anything in that month. This sort of report might be useful for the sales staff to target those clients.

OK, if a LEFT JOIN does an inner join and adds in any additonal "missing" rows from t1, what do you think a RIGHT JOIN does? Exactly. Do an INNER JOIN and add in any missing rows from t2.

joindb=# select * from clients right join orders on clients.id=orders.id;
 id | name  | id | value
----+-------+----+-------
  1 | bob   |  1 | val1
  2 | alice |  2 | val2
    |       |  3 | val3
(3 rows)

Here we can see the row for id=3 from orders shown in blue.

The last type is a FULL OUTER JOIN. That is an left and a right outer join. Do an inner join and add in any missing rows from both t1 and t2.

joindb=# select * from clients full join orders on clients.id=orders.id;
 id | name  | id | value
----+-------+----+-------
  0 | joe   |    |
  1 | bob   |  1 | val1
  2 | alice |  2 | val2
    |       |  3 | val3
(4 rows)

Again, we can use the ON join_criteria or the USING (join_column_list) syntax and we can include or omit the OUTER keyword.

We can also do a NATURAL join for left, right and full outer joins. Recall that with natural joins we ask PostgreSQL to match up the columns that share the same name.

joindb=# select * from clients natural full join orders;
 id | name  | value
----+-------+-------
  0 | joe   |
  1 | bob   | val1
  2 | alice | val2
  3 |       | val3
(4 rows)

Finally, in this and the last post we only joined two tables. We can in fact join as many tables as we wish in a single query. For outer joins you will need to use parentheses and nest the joins. For inner joins, however, one can do

select * from tablename1 t1, tablename2 t2, tablename3 t3,... where t1.id=t2.id and t1.id=t3.id and...;

5 comments:

  1. Very well explained - thanks!

    ReplyDelete
  2. > select * from tablename1 t1, tablename2 t2, tablename3 t3,... where t1.id=t2.id and t1.id=t3.id and...;

    This is great! But how do I use abbreviated table names in the SELECT clause in which I'm not using a '*' wildcard:

    SELECT t1.id FROM tablename t1 ...

    This gives an error.

    ReplyDelete
  3. great stuff again! thank you so much!

    ReplyDelete
  4. Hi. Thank you very much for the explanation. It's really really clear. I still have a question though... what if I want to query for those that does not have orders? or orders without a real client? I know I can do it with a NOT IN query, but I was wondering if this (the complement of the intersection) could be done with a OUTER JOIN....

    Thanks in advance for your time!

    Andrés

    ReplyDelete