A hot startup is holding a special event and wants to send out invites to some of their best clients and also to some VIPs. Some of the VIPs are actually very supportive of the site and are clients too. What query will provide the complete set of people to invite avoiding duplicate records? For this, we want UNION. UNION is an OR operator for multiple result sets. Here is our data
create database hotstartup;
\c hotstartup;
create table clients(name varchar);
insert into clients values('John Smith');
insert into clients values('Ashton Kutcher');
insert into clients values('Joe Doe');
insert into clients values('Stephen Fry');
create table vips(name varchar);
insert into vips values('Ashton Kutcher');
insert into vips values('Stephen Fry');
insert into vips values('Demi Moore');
insert into vips values('Hugh Laurie');
and the query will be
hotstartup=# select * from clients union select * from vips;
name
----------------
Ashton Kutcher
Hugh Laurie
Joe Doe
Demi Moore
John Smith
Stephen Fry
(6 rows)
The query is saying get the rows from clients and append the rows of vips but remove any duplicates. Here we can see that we can have a total of 6 unique names from the initial set of 8 rows.
There are times when we do not want to remove duplicates. For that, we can append ALL after UNION:
hotstartup=# select * from clients union all select * from vips;
name
----------------
John Smith
Ashton Kutcher
Joe Doe
Stephen Fry
Ashton Kutcher
Stephen Fry
Demi Moore
Hugh Laurie
(8 rows)
The head party planner sees the list of 6 invites and says "No, no, no. We need to send out two sets of invites, one set to the VIPs that will get them into the VIP tent and then ordinary invites for everyone else". Clearly,
select * from vips will return the names for the VIP invite list. However, what will get us the list for the remaining invites? We want everyone on the clients list EXCEPT those on the VIP list. Unsurprisingly, we use EXCEPThotstartup=# select * from clients except select * from vips;
name
------------
Joe Doe
John Smith
(2 rows)
Finally, if we want to get the list of people who are both clients and VIP we can use INTERSECT.
hotstartup=# select * from clients intersect select * from vips;
name
----------------
Ashton Kutcher
Stephen Fry
(2 rows)
Both EXCEPT and INTERSECT can be followed by an optional ALL. To examine its effect we need to add some duplicate data. Let's insert a duplicate name into clients:
hotstartup=# select * from clients;
name
----------------
John Smith
Ashton Kutcher
Joe Doe
Stephen Fry
Stephen Fry
(5 rows)
hotstartup=# select * from vips;
name
----------------
Ashton Kutcher
Demi Moore
Hugh Laurie
Stephen Fry
(4 rows)
Compare the results of these two
hotstartup=# select * from clients except select * from vips;
name
------------
Joe Doe
John Smith
(2 rows)
versus
hotstartup=# select * from clients except all select * from vips;
name
-------------
Joe Doe
John Smith
Stephen Fry
(3 rows)
The latter query is essentially saying take each row of
clients and find a matching row in vips. If there is match delete the pair of rows from the results set. For INTERSECT, let's insert Stephen Fry into
vips to have him duplicated in both listshotstartup=# select * from clients;
name
----------------
John Smith
Ashton Kutcher
Joe Doe
Stephen Fry
Stephen Fry
(5 rows)
hotstartup=# select * from vips;
name
----------------
Ashton Kutcher
Stephen Fry
Demi Moore
Hugh Laurie
Stephen Fry
(5 rows)
Then we can see that INTERSECT ALL no longer removes duplicates, i.e.
hotstartup=# select * from clients intersect select * from vips;
name
----------------
Ashton Kutcher
Stephen Fry
(2 rows)
versus
hotstartup=# select * from clients intersect all select * from vips;
name
----------------
Ashton Kutcher
Stephen Fry
Stephen Fry
(3 rows)
The latter query is saying find matching pairs of rows. Stephen Fry appears in both tables twice so we find two matching pairs for him and hence two rows appears in the results.
In all these three operators the columns of the two select statements must match. We cannot have the first select statement returning 3 fields and the second returning two. We can demonstrate that with our
ceos table:apple=# select * from ceos;
name | year
------------------+------
Steve Jobs | 1976
Mike Markkula | 1977
Mike Scott | 1978
John Sculley | 1983
Michael Spindler | 1993
Gil Amelio | 1996
Steve Jobs | 1997
(7 rows)
apple=# select * from ceos where year between 1970 and 1979 union select * from ceos where year=1977;
name | year
---------------+------
Mike Scott | 1978
Steve Jobs | 1976
Mike Markkula | 1977
(3 rows)
This is OK because both statements are returning * which as they refer to the same table means the same set of fields.
This, however, is not acceptable
apple=# select * from ceos where year between 1970 and 1979 union select name from ceos where year=1977;
ERROR: each UNION query must have the same number of columns
LINE 1: ...os where year between 1970 and 1979 union select name from ...
because the first statement returns all fields wheareas the second returns one field only.
awesome again!!! magnifico!
ReplyDeletethank you
ReplyDelete