PostgreSQL Tutorial – Creating the Pagila Schema

In the previous article, we talked about creating your first PostgreSQL database and I went over the syntax of createdb and psql in a bit of detail. Next up, we’re going to look at creating the structure of the pagila database.``` createdb -U jeremiah pagila psql -U jeremiah -f pagila-schema.sql pagila psql -U jeremiah -f pagila-insert-data.sql pagila


### Working With SQL

Even though SQL is the name of the language that we use to work with data, there are two subsets of SQL that you will hear people refer to: DML and DDL.

#### DDL

DDL is the common abbreviation for Data Definition Language. This is the name for the part of SQL that is used for creating and modifying database structures – tables, indexes, views, etc. The `pagila-schema.sql` file is chock full of DDL and is a great introduction to how to create objects in PostgreSQL.

#### DML

DML stands for Data Manipulation Language. This is the other subset of SQL. This is how we read, update, insert, and delete data. We’ll be talking about DML in the next article.

### Tables

Tables are the heart and soul of a relational database – they’re where data is stored. Let’s take a look at creating the actors table:```
CREATE TABLE public.actor  ( 
    actor_id    int4 NOT NULL DEFAULT nextval('actor_actor_id_seq'::regclass),
    first_name  varchar(45) NOT NULL,
    last_name   varchar(45) NOT NULL,
    last_update timestamp NOT NULL DEFAULT now(),
    PRIMARY KEY(actor_id)
)
GO
CREATE INDEX idx_actor_last_name
    ON public.actor(last_name)
GO
CREATE TRIGGER last_updated
     BEFORE UPDATE ON actor FOR EACH ROW
     EXECUTE PROCEDURE last_updated()
GO

CREATE FUNCTION last_updated() RETURNS trigger
    AS $$
BEGIN
    NEW.last_update = CURRENT_TIMESTAMP;
    RETURN NEW;
END $$
    LANGUAGE plpgsql; 
```What’s going on here? Well, we’re creating a table named actor, for starters (I’ll come back to the sequence in a minute). The actor table contains four columns:`actor_id`, `first_name`, `last_name`, and `last_update`. The`first_name` and `last_name` columns are the simplest, so I’m going to start with them. Both the `first_name` and `last_name` columns store character data – strings of text. These columns store variable length character data that can hold up to 45 characters. Both of these columns have been created as `NOT NULL`. This means that the columns require a value, even if that value is an empty string. Ideally you will have additional logic in place to prevent empty strings from being stored in the database as meaningful values. The `last_update` column holds a date and time. The first time a row is inserted into the actor table, the `last_update` column will default to the current date and time. There is a trigger on this table, too, that will update`last_update` with the time of the update. While `last_update` does not contain any time zone specific information, it is possible for PostgreSQL to store time zone information by using the `timestamp with time zone` data type. Finally we come to the `actor_id` column. This column is used as the primary key – it uniquely identifies any row in the actor table and requires the the values in the row not be null. I’ve omitted the code to create the`actor_actor_id_seq` sequence, but I’ll summarize. A sequence is similar an `IDENTITY` column in SQL Server or an `AUTOINCREMENT` column in MySQL, it just requires manual intervention. Whenever we want to increment the value of a sequence, we need to use the `nextval()` function to get the next value from the sequence. In the `actor` table, our primary key is a 4-byte number that will automatically be incremented by 1 every time a new row is added. Once we have the four main columns created, we create an index`idx_actor_last_name`. Indexes make it possible for PostgreSQL to rapidly find data in a large table. It’s possible to create an index on more than one column. In this case, though, we’ll be creating our index on just the one column. This makes it easier to find actors by their last name. The last thing we do is create a trigger that runs after an update. Triggers let us run pieces of code after an event (such as an insert, update, or delete) happens on a table. The `last_updated` trigger runs a function named`last_updated` before any row is updated in the database. The`last_updated` function modifies the `NEW` database row (a virtual row used in inserts and updates on row-level triggers). This new row will be inserted into the table with a new value in the `last_update` column. You’re probably wondering what’s with the `::regclass` and the `$$` when we created the function. Be patient, everything will be explained in time. Right now it’s more important to focus on the higher level concepts and slowly move into the details as time goes on.

### Summary

Creating a basic table is a simple operation, once you know what you want to store in it. You name the table and then supply a list of columns with a data type for each one. This helps PostgreSQL figure out the optimum way to store data on disk.