Tuesday, December 7, 2010

CREATE FUNCTION - basic anatomy

Functions or stored procedures are a very powerful feature of PostgreSQL. Functions allow one to modularize code, create "units" that do useful tasks and hide the complexity of the underlying code, here a query. They also reduce the risk of making a mistake by typing the same query over and over again, such as running a daily or monthly report.

PostgreSQL supports a particularly large variety of programming languages. By using these procedural language extensions one can write functions in Java (PL/Java), Python (PL/Python), R (PL/R), Perl (PL/Perl) and many others to interact with a PostgreSQL database. In this and the next post, however, we are going to cover simple functions in SQL only. In later posts we will cover PL/pgSQL a more powerful and flexible procedural language that provides features such as control flow (such as IF...THEN...ELSE), local variables and multiple queries.

We'll start with a simple SQL function. In the next post, we will expand on this and detail some additional options and features that CREATE FUNCTION supports.

Here are the sales data from the last post:

aggs=# select * from salesfigures;
 name | month | sales
------+-------+-------
 joe  | jan   |    34
 joe  | feb   |    23
 bob  | jan   |    18
 bob  | feb   |    30
(4 rows)

If we want to get the total sales for a given month we can do

aggs=# select sum(sales) from salesfigures where month='jan';
 sum
-----
  52
(1 row)

but if we know that are going to have to do this each month (and you are likely to forget the query next month) it might make sense to create a function called totalsalesformonth() that one can call and pass in the month of interest. Here is how we do that. Don't worry if this looks complicated for now. We will step through each part.

CREATE FUNCTION totalsalesformonth(month varchar) returns decimal as $$
    select sum(sales) from salesfigures where month=$1;
$$ language 'sql';


and we can run it as

aggs=# select totalsalesformonth('jan');
 totalsalesformonth
--------------------
                 52
(1 row)

and get the same result as the plain query. Let us break this function definition down.

COMMAND
CREATE FUNCTION is the command that runs to the final semicolon. It is a query so we can paste this into the psql command line or into pgAdmin the same as any other query.

It is defining a function called totalsalesformonth. PostgreSQL ignores the capitilization and turns everything into lower case so this call is also recognized

aggs=# select totalsalesForMonth('jan');
 totalsalesformonth
--------------------
                 52
(1 row)

NAME
The function name must be a single string. It cannot be "total sales for month" but it can be total_sales_for_month. Choose a sensible descriptive name for the function. You and any other users will thank you later.

PARAMETER LIST
We are specifying that it takes one argument or parameter (technically this is an IN parameter) called "month" and is of type varchar.
We can have more than one argument. These are specified as a comma-delimited list.

RETURN TYPE
The function is declared to return a single value of type decimal. We will discuss return types further in the next post.

FUNCTION BODY
The "$$" says to PostgreSQL that we are starting the code that constitutes the body of the function.
The function body should be a single statement, i.e. only one semicolon. PostgreSQL will allow multiple statements but it is only going to return the last. That is,

CREATE FUNCTION totalsalesformonth(month varchar) returns decimal as $$
    select sum(sales) from salesfigures where month=$1;
    select 3.;
$$ language 'sql';

is valid and will return 3 when called.
The names of the parameters in the parameter list are completely ignored. Instead we use $1 in place of the first parameter, $2 for the second parameter etc. So, when we call

select totalsalesForMonth('jan');

PostgreSQL takes the value 'jan' and replaces $1 with this to produce

select sum(sales) from salesfigures where month='jan';

which is then executed.

Importantly, one cannot use a passed in parameter for the name of a field or the name of a table. For instance, this function will not compile:

CREATE FUNCTION totalsales(tablename varchar) returns decimal as $$
    select sum(sales) from $1;                                       //WRONG!
$$ language 'sql';

PostgreSQL complains of a syntax error. Why is this?
PostgreSQL is strongly typed. That means it checks that the output of the query matches the types that the function should return. In this case, the parser cannot know if the tablename passed in has a sales field and if it does whether it is numeric. There are too many unknowns for PostgreSQL to be happy that this function will work as intended.If you do need to create dynamic queries such as this, one can achieve it with PL/pgSQL.

Although the names are ignored it is important to use good descriptive names for documentation purposes. I don't know about you but I can forget what I coded last week and appreciate the help of good documentation and besides, you might not be the only user of these functions. I would strongly discourage use of generic names such as "arg0", "arg1", "x", "y" unless there is good reason. (For instance, x and y might be the best choice for a function that processes a point in the Cartesian plane.)

LANGUAGE
Finally, we tell PostgreSQL which language we are using.

In summary, our function is

CREATE FUNCTION function_name (argument_name argument_type, argument_name argument_type, ...) RETURNS return_type AS $$
     function_body_query;
$$ language language_name;

This covers the basic anatomy of a SQL function. CREATE FUNCTION is a sophisticated command and supports a richer set of options. We'll start on those in the next post.

5 comments:

  1. this looks really cool! i could not get it to work. maybe another time.

    ReplyDelete
  2. really help me familiarize myself with postges function.
    thank you!

    ReplyDelete
  3. Thanks for good tutorials for beginner....

    ReplyDelete
  4. Thanks a lot, I was needing a bit of explanation of the basics of custom functions.

    ReplyDelete