Wednesday, November 10, 2010

Basic SQL

So far we have only covered the simplest SELECT SQL queries: select * from [tablename];. That is not going to get us too far. Let's broaden our repertoire. The material in this post is standard SQL. That is, it is not specific to PostgreSQL. Thus, I would encourage you to read around the wealth of other online resources. Ultimately, however, the best way to learn SQL is to use it and practice.

Copy and paste the following SQL into psql which is a list of Apple Computer CEOs and the year that they started that position:

create database apple;
\c apple
create table ceos (name varchar, year int);
insert into ceos values('Steve Jobs',1976);
insert into ceos values('Mike Markkula',1977);
insert into ceos values('Mike Scott',1978);
insert into ceos values('John Sculley',1983);
insert into ceos values('Michael Spindler',1993);
insert into ceos values('Gil Amelio',1996);
insert into ceos values('Steve Jobs',1997);
select * from ceos;

We can now start adding some WHERE clauses to our queries to filter down results. Who was an Apple CEO before 1984? That is a simple WHERE clause:

apple=# select name from ceos where year < 1984;
     name     
---------------
 Steve Jobs
 Mike Markkula
 Mike Scott
 John Sculley
(4 rows)


Not surprisingly, < is not the only operator available. We have all the usual >, >=, <, <=, =, != as well as many more, including those for string operations and regular expressions.

In SQL, the "WHERE x BETWEEN y AND z" clause means look for rows where some field x has a value >= y and value <= z. We can use that to query who was a CEO in the 1970s:

apple=# select name from ceos where year between 1970 and 1979;
     name     
---------------
 Steve Jobs
 Mike Markkula
 Mike Scott
(3 rows)

A SELECT statement can have any number of filtering clauses. The first is preceded with a "WHERE" and later ones are preceded with an AND or an OR — for example, select name from ceos where year <= 1970 OR year >= 1980; — and can involve parentheses, subqueries and other constructs which we will cover later.

A powerful additional clause type is a GROUP BY. Let's get the name and number of times each person has been a CEO:

apple=# select name, count(*) from ceos group by name;
       name       | count
------------------+-------
 Michael Spindler |     1
 John Sculley     |     1
 Gil Amelio       |     1
 Mike Markkula    |     1
 Mike Scott       |     1
 Steve Jobs       |     2
(6 rows)

When we group by, I like to imagine PostgreSQL creating a bunch of virtual minitables as an interim step, one per grouping, here by name.  Thus, one could think of PostgreSQL thinking "ah, Steve Jobs has two rows, let's create a minitable for him with those two rows, and everyone else gets a single row table:

       name       | year
------------------+------
 Steve Jobs       | 1976
 Steve Jobs       | 1997

       name       | year
------------------+------
 Mike Markkula    | 1977


       name       | year
------------------+------
 Mike Scott       | 1978


       name       | year
------------------+------
 John Sculley     | 1983

       name       | year
------------------+------
 Michael Spindler | 1993


       name       | year
------------------+------
 Gil Amelio       | 1996

Then it will look at the part of the query before the FROM, which is called the select list. What does the user want from these grouped data? They want the name. We grouped by name so the name will be the same for each row within a minitable. OK, output the name field from each minitable, the first row will do. What else do they want? A count(*) meaning count all the rows. That is 2 for Jobs, 1 for everyone else.

When we GROUP BY we can also add a posterior HAVING clause. (GROUP BYs don't have to have a HAVING but HAVING must have a GROUP BY.) Let's look at an example. Who was a CEO more than once?

apple=# select name from ceos group by name having count(*) > 1;
    name   
------------
 Steve Jobs
(1 row)

This means group first, then check the having clause which must return true of false for each group. Then after look that, inspect the required output in the select list.

PostgreSQL will group by as in the set of tables above, then it will count the number of rows for each and only those who have more than one row will qualify. Finally, we asked for the name only so it returns the name from that one minitable returning "Steve Jobs".

This has been a taste of some of the richness that SQL offers. We will cover additional query constructs later.

No comments:

Post a Comment