PostgreSQL SERIAL and IDENTITY Columns: An Introduction


PostgreSQL SERIAL and IDENTITY Columns: A Comprehensive Introduction

In the world of relational databases, ensuring the uniqueness and integrity of data is paramount. One of the most fundamental concepts in achieving this is the primary key – a column or set of columns that uniquely identifies each row within a table. While primary keys can be derived from natural data attributes (like a user’s email address or a product’s SKU), it’s often more robust, efficient, and simpler to use surrogate keys. These are artificial keys, typically sequential integers, generated by the database system itself, with no inherent meaning other than uniquely identifying a row.

PostgreSQL, a powerful and feature-rich open-source object-relational database system, provides two primary mechanisms for automatically generating these sequential surrogate keys: the traditional SERIAL pseudo-types and the more modern, SQL standard IDENTITY columns.

This article provides a deep dive into both SERIAL and IDENTITY columns in PostgreSQL. We will explore:

  1. The fundamental need for auto-incrementing keys.
  2. The underlying mechanism powering both: PostgreSQL Sequences.
  3. A detailed look at the SERIAL pseudo-type: its variants, usage, advantages, and disadvantages.
  4. A detailed look at the SQL standard IDENTITY columns: GENERATED ALWAYS and GENERATED BY DEFAULT, syntax, benefits, and drawbacks.
  5. A thorough comparison between SERIAL and IDENTITY.
  6. Guidance on choosing the right approach for your needs.
  7. Practical considerations like retrieving generated IDs, handling gaps, resetting sequences, and permissions.
  8. How to migrate existing SERIAL columns to IDENTITY.
  9. Advanced considerations and alternatives.

By the end of this article, you will have a comprehensive understanding of how PostgreSQL handles auto-incrementing columns, enabling you to make informed decisions when designing your database schemas.

1. The Need for Auto-Incrementing Keys (Surrogate Keys)

Before diving into the specifics of SERIAL and IDENTITY, let’s solidify why auto-incrementing keys are so widely used.

  • Guaranteed Uniqueness: The primary purpose of a primary key is to uniquely identify each row. Manually assigning unique IDs is prone to errors, collisions (attempting to assign the same ID twice), and inefficiencies. Auto-incrementing mechanisms ensure that each new row automatically receives a unique identifier within the table.
  • Simplicity: Using a simple integer as a key is straightforward. It’s easy to understand, query, and use in foreign key relationships in other tables.
  • Stability: Natural keys (like email addresses or usernames) can sometimes change. If an email address used as a primary key changes, it requires updating not only the primary key column but also all foreign key references in other tables, which can be complex and error-prone. Surrogate keys, being artificial, rarely need to change once assigned.
  • Performance: Integer keys are generally compact and efficient for indexing and joining operations compared to potentially long string-based natural keys. Comparisons and lookups are faster.
  • Anonymity/Abstraction: Surrogate keys decouple the row’s identity from its potentially sensitive or mutable real-world attributes.

While auto-incrementing integers are the most common form of surrogate keys, other strategies like using UUIDs (Universally Unique Identifiers) also exist. However, for many common use cases, sequential integers provide a good balance of simplicity, performance, and guaranteed uniqueness within the scope of a single table. PostgreSQL’s SERIAL and IDENTITY are designed precisely for generating these sequential integer keys.

2. The Foundation: Understanding PostgreSQL Sequences

Both SERIAL and IDENTITY columns in PostgreSQL are built upon a fundamental database object: the Sequence. A sequence is a special kind of database object that generates a sequence of integers according to specified rules. It’s essentially a sophisticated counter maintained by the database server.

Understanding sequences is crucial because they are the engine powering auto-incrementing behavior. Even if you primarily use the higher-level SERIAL or IDENTITY syntax, knowing about sequences helps in troubleshooting, customization, and understanding the underlying mechanics.

Key Concepts of Sequences:

  • Independent Objects: Sequences exist independently of tables. They have their own names, owners, and permissions.
  • Generation: They generate unique values across concurrent transactions. PostgreSQL handles the necessary locking to ensure that multiple sessions requesting a number from the same sequence will each receive distinct values.
  • Non-Transactional (Typically): The operation of fetching the next value from a sequence (nextval()) is generally non-transactional. This means that if a transaction requests a sequence value but later rolls back, the sequence value is consumed and will not be reused. This is a primary reason why gaps can appear in auto-incremented sequences.
  • Customizable: Sequences offer various options for customization:
    • INCREMENT BY: The value to add to the current sequence value to get the next one (default is 1). Can be negative.
    • MINVALUE: The minimum value the sequence can generate.
    • MAXVALUE: The maximum value the sequence can generate.
    • START WITH: The initial value the sequence will return.
    • CACHE: How many sequence numbers to pre-allocate and store in memory for faster access. This can significantly improve performance in high-throughput systems but increases the likelihood of larger gaps if the server restarts or a session consuming cached values terminates unexpectedly. Default is typically 1.
    • CYCLE: Allows the sequence to wrap around and restart from MINVALUE (or MAXVALUE for descending sequences) once it reaches its limit. This is generally not desirable for primary keys, as it violates uniqueness.

Creating and Using Sequences Manually:

You can create and interact with sequences directly using SQL commands:

“`sql
— Create a sequence for tracking order numbers
CREATE SEQUENCE public.order_number_seq
INCREMENT BY 1
MINVALUE 1
MAXVALUE 999999
START WITH 1000
CACHE 10; — Pre-allocate 10 numbers

— Grant usage permission (necessary if the user creating the table is different)
— GRANT USAGE, SELECT ON SEQUENCE public.order_number_seq TO web_user;

— Get the next value from the sequence
SELECT nextval(‘public.order_number_seq’);

— Get the most recent value returned by nextval() for this session
SELECT currval(‘public.order_number_seq’);
— Note: currval() requires nextval() to have been called previously in the same session.

— Set the sequence’s current value (use with caution!)
SELECT setval(‘public.order_number_seq’, 5000); — Next nextval() will return 5001
SELECT setval(‘public.order_number_seq’, 5000, false); — Next nextval() will return 5000

— Alter sequence properties
ALTER SEQUENCE public.order_number_seq RESTART WITH 2000;
ALTER SEQUENCE public.order_number_seq INCREMENT BY 10;

— Drop a sequence
DROP SEQUENCE public.order_number_seq;
“`

With this understanding of sequences as the underlying mechanism, let’s explore how SERIAL and IDENTITY provide convenient ways to use them for auto-incrementing table columns.

3. The SERIAL Pseudo-Type: The Traditional Approach

For many years, SERIAL was the standard way to create auto-incrementing integer columns in PostgreSQL. It’s important to note that SERIAL, SMALLSERIAL, and BIGSERIAL are not true data types in PostgreSQL. They are shorthand notations, or “syntactic sugar,” that instruct PostgreSQL to perform a set of actions during table creation.

Variants:

PostgreSQL offers three SERIAL pseudo-types, corresponding to different integer sizes:

  1. SMALLSERIAL: Creates a SMALLINT column (2 bytes, range -32,768 to +32,767). Suitable for tables where the number of rows is guaranteed to be relatively small.
  2. SERIAL: Creates an INTEGER column (4 bytes, range -2,147,483,648 to +2,147,483,647). The most common choice, suitable for many applications.
  3. BIGSERIAL: Creates a BIGINT column (8 bytes, range -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807). Necessary for tables that might potentially store billions of rows.

Syntax and “Under the Hood”:

When you define a column using one of the SERIAL types, PostgreSQL automatically performs these actions:

  1. Creates a Sequence: It creates an independent sequence object. The sequence name is automatically generated based on the table and column name, typically following the pattern <tablename>_<columnname>_seq.
  2. Creates the Column: It creates the table column with the corresponding integer type (SMALLINT, INTEGER, or BIGINT).
  3. Sets Default Value: It sets the default value for the column to the next value generated by the newly created sequence (nextval('<tablename>_<columnname>_seq'::regclass)).
  4. Adds NOT NULL Constraint: It adds a NOT NULL constraint to the column, as primary keys cannot be null.
  5. Sets Sequence Ownership: It marks the sequence as “owned by” the column. This is a crucial detail: it means that if the column or the table is dropped, the associated sequence will also be automatically dropped.

Example:

“`sql
— Using SERIAL
CREATE TABLE users (
user_id SERIAL PRIMARY KEY, — Creates INTEGER column + sequence + default + NOT NULL
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

— Using BIGSERIAL for potentially very large tables
CREATE TABLE logs (
log_id BIGSERIAL PRIMARY KEY, — Creates BIGINT column + sequence + default + NOT NULL
log_message TEXT,
log_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
“`

Let’s examine what PostgreSQL actually creates for the users table:

“`sql
— Show table definition (output might vary slightly based on PG version)
\d users

— Expected approximate output:
— Table “public.users”
— Column | Type | Collation | Nullable | Default
— ————+————————–+———–+———-+——————————————
— user_id | integer | | not null | nextval(‘users_user_id_seq’::regclass)
— username | character varying(50) | | not null |
— email | character varying(100) | | |
— created_at | timestamp with time zone | | | CURRENT_TIMESTAMP
— Indexes:
— “users_pkey” PRIMARY KEY, btree (user_id)
— “users_username_key” UNIQUE CONSTRAINT, btree (username)
— “users_email_key” UNIQUE CONSTRAINT, btree (email)

— Show the associated sequence definition
\d users_user_id_seq

— Expected approximate output:
— Sequence “public.users_user_id_seq”
— Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
— ———-+——-+———+————+———–+———+——-
— integer | 1 | 1 | 2147483647 | 1 | no | 1
— Owned by: public.users.user_id
“`

As you can see, SERIAL is a convenient shorthand for setting up the sequence, column type, default value, NOT NULL constraint, and ownership link.

Advantages of SERIAL:

  • Simplicity: The syntax is very concise and easy to remember.
  • Historical Prevalence: It has been the standard method in PostgreSQL for a long time, so it’s widely understood and found in much existing code, documentation, and tutorials.
  • Automatic Cleanup: The ownership link ensures the sequence is automatically dropped when the column or table is dropped, preventing orphaned sequences.

Disadvantages of SERIAL:

  • Non-Standard: SERIAL is a PostgreSQL-specific extension. It is not part of the SQL standard. This can be a drawback if you aim for database portability or adherence to standards.
  • Implicit Behavior: The underlying mechanics (sequence creation, default setting) are somewhat hidden behind the pseudo-type, which can sometimes be less clear than explicitly defining behavior.
  • Permissions Complexity (Minor): The ownership model can sometimes interact confusingly with permissions. While USAGE permission on the sequence is automatically granted to the table owner, finer-grained permission management might require interacting directly with the sequence object. For example, if a different user needs to insert into the table, they implicitly need USAGE permission on the sequence. While often handled correctly by default role memberships, it can be a point of confusion.
  • Inflexible Insert Behavior: By default, SERIAL columns behave like GENERATED BY DEFAULT (which we’ll discuss next) – you can override the sequence by providing your own value during INSERT, but this is often discouraged as it can lead to duplicate key violations if the sequence later catches up to the manually inserted value. There’s no built-in way with SERIAL syntax alone to prevent overriding the sequence value.

Practical Example: Inserting Data

“`sql
— Insert data, letting the SERIAL column generate the ID
INSERT INTO users (username, email)
VALUES (‘alice’, ‘[email protected]’);

INSERT INTO users (username, email)
VALUES (‘bob’, ‘[email protected]’);

— Retrieve the generated ID using the RETURNING clause (Recommended)
INSERT INTO users (username, email)
VALUES (‘charlie’, ‘[email protected]’)
RETURNING user_id;

— Check the table contents
SELECT * FROM users;
— user_id | username | email | created_at
— ———+———-+——————–+——————————-
— 1 | alice | [email protected] | 2023-10-27 10:00:00.123456+00
— 2 | bob | [email protected] | 2023-10-27 10:00:05.789012+00
— 3 | charlie | [email protected]| 2023-10-27 10:00:10.345678+00

— It’s possible (but usually discouraged) to override the SERIAL value
— Note: This might cause a duplicate key error if ‘4’ is inserted later by the sequence.
INSERT INTO users (user_id, username, email)
VALUES (100, ‘manual_user’, ‘[email protected]’);

— See the current sequence value (will be 3 after inserting charlie)
SELECT last_value FROM users_user_id_seq;
— last_value


— 3

— If we insert again, it continues from where the sequence left off
INSERT INTO users (username, email)
VALUES (‘david’, ‘[email protected]’);

SELECT * FROM users WHERE username = ‘david’;
— user_id | username | email | created_at
— ———+———-+——————–+——————————-
— 4 | david | [email protected] | 2023-10-27 10:05:00.987654+00
“`

4. The IDENTITY Column: The SQL Standard Approach

Introduced in PostgreSQL version 10, IDENTITY columns are the SQL standard way of defining auto-incrementing columns. They provide a more explicit, standard-compliant, and arguably more robust way to achieve the same goal as SERIAL.

Like SERIAL, IDENTITY columns also rely on sequences internally, but the relationship and syntax are defined according to the SQL standard.

Syntax and Variants:

The IDENTITY clause is added directly to the column definition. It comes in two main flavors, controlling how the database handles explicit value insertion attempts:

  1. GENERATED ALWAYS AS IDENTITY:

    • This is the stricter, often preferred option.
    • The database always generates a value for this column using the associated sequence.
    • Attempting to specify a value for this column during an INSERT or UPDATE statement will result in an error by default.
    • It strongly enforces the idea that the database is solely responsible for generating the ID.
    • You can temporarily override this behavior for specific INSERT statements using OVERRIDING SYSTEM VALUE, typically used for data loading or specific maintenance tasks, but it requires explicit intent.
  2. GENERATED BY DEFAULT AS IDENTITY:

    • This behaves more like the traditional SERIAL type.
    • The database generates a value by default only if one is not explicitly provided in the INSERT statement.
    • If you provide a value for the column during INSERT, that value will be used instead of the sequence-generated value.
    • This offers more flexibility but carries the same risk as SERIAL of potential future collisions if manually inserted values conflict with sequence-generated values.

Example:

“`sql
— Using GENERATED ALWAYS AS IDENTITY (Recommended for most new designs)
CREATE TABLE products (
product_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, — Stricter
product_name VARCHAR(100) NOT NULL,
price NUMERIC(10, 2)
);

— Using GENERATED BY DEFAULT AS IDENTITY (Similar behavior to SERIAL)
CREATE TABLE categories (
category_id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, — More flexible insert
category_name VARCHAR(50) NOT NULL UNIQUE
);

— Using BIGINT with IDENTITY
CREATE TABLE event_log (
event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_type VARCHAR(30),
event_data JSONB,
logged_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
“`

Under the Hood:

Defining a column with IDENTITY also involves sequences, but the relationship is managed slightly differently compared to SERIAL:

  1. Creates a Sequence: PostgreSQL creates an internal sequence associated with the column. The naming convention might differ, often incorporating identity.
  2. Creates the Column: It creates the table column with the specified data type (e.g., INT, BIGINT). The NOT NULL constraint is usually implied by PRIMARY KEY or can be added explicitly.
  3. Associates Sequence: It links the sequence to the column via the IDENTITY property, setting the generation mechanism (ALWAYS or BY DEFAULT). This linkage is tighter than SERIAL‘s default value approach.
  4. Sequence Ownership: The sequence is internally marked as owned by the identity column, ensuring it’s dropped if the column or table is dropped, similar to SERIAL.

You can inspect the definition:

“`sql
\d products
— Expected approximate output:
— Table “public.products”
— Column | Type | Collation | Nullable | Default
— ————–+————————+———–+———-+——————————————
— product_id | integer | | not null | generated always as identity
— product_name | character varying(100) | | not null |
— price | numeric(10,2) | | |
— Indexes:
— “products_pkey” PRIMARY KEY, btree (product_id)
— Identity columns:
— “product_id” integer GENERATED ALWAYS AS IDENTITY ( INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483647 START WITH 1 CACHE 1 )

\d categories
— Expected approximate output:
— Table “public.categories”
— Column | Type | Collation | Nullable | Default
— —————-+———————–+———–+———-+——————————————–
— category_id | integer | | not null | generated by default as identity
— category_name | character varying(50) | | not null |
— Indexes:
— “categories_pkey” PRIMARY KEY, btree (category_id)
— “categories_category_name_key” UNIQUE CONSTRAINT, btree (category_name)
— Identity columns:
— “category_id” integer GENERATED BY DEFAULT AS IDENTITY ( INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483647 START WITH 1 CACHE 1 )
“`

Notice how the Default column now explicitly states generated always as identity or generated by default as identity, and there’s a separate section Identity columns detailing the sequence parameters.

Customizing Sequence Options:

A significant advantage of IDENTITY is the ability to specify sequence options directly within the CREATE TABLE or ALTER TABLE statement, making the configuration more explicit and self-contained:

sql
CREATE TABLE orders (
order_id BIGINT GENERATED ALWAYS AS IDENTITY (
START WITH 1000
INCREMENT BY 5
MINVALUE 1000
MAXVALUE 1000000000
CACHE 20
-- CYCLE -- Generally avoid CYCLE for primary keys
) PRIMARY KEY,
customer_id INT NOT NULL, -- Assuming a foreign key to a customers table
order_date DATE DEFAULT CURRENT_DATE
);

This clearly defines the behavior of the underlying sequence without needing separate CREATE SEQUENCE or ALTER SEQUENCE commands.

Advantages of IDENTITY:

  • SQL Standard Compliance: This is the primary advantage. Using IDENTITY makes your schema more portable to other SQL-compliant database systems and aligns with modern SQL practices.
  • Explicit Syntax: The GENERATED ALWAYS and GENERATED BY DEFAULT keywords clearly document the intended behavior regarding INSERT overrides. GENERATED ALWAYS provides stronger protection against accidental manual insertions.
  • Clearer Definition: The definition is arguably cleaner, separating the data type (INT, BIGINT) from the generation mechanism (IDENTITY).
  • Integrated Sequence Options: Sequence parameters can be defined directly within the column definition, improving readability and maintainability.
  • Generally Preferred: It’s the recommended approach for new applications using PostgreSQL 10 and later.

Disadvantages of IDENTITY:

  • PostgreSQL Version: Requires PostgreSQL version 10 or newer. If you need compatibility with older versions, SERIAL is the only option.
  • Slightly More Verbose: The syntax (GENERATED ALWAYS AS IDENTITY) is longer than SERIAL.
  • Potential Confusion: Users migrating from systems with different IDENTITY implementations (like SQL Server) might need to adjust to the ALWAYS vs. BY DEFAULT distinction and PostgreSQL’s specific syntax.

Practical Example: Inserting Data

“`sql
— Insert into a GENERATED ALWAYS table (cannot specify ID)
INSERT INTO products (product_name, price)
VALUES (‘Laptop Pro’, 1299.99);

INSERT INTO products (product_name, price)
VALUES (‘Wireless Mouse’, 25.50)
RETURNING product_id;

— Attempting to insert an ID fails (by default)
— INSERT INTO products (product_id, product_name, price) VALUES (101, ‘Manual Keyboard’, 75.00);
— ERROR: cannot insert a non-default value into column “product_id”
— DETAIL: Column “product_id” is an identity column defined as GENERATED ALWAYS.
— HINT: Use OVERRIDING SYSTEM VALUE to override.

— Override system value (use with caution, requires specific intent)
INSERT INTO products (product_id, product_name, price)
OVERRIDING SYSTEM VALUE
VALUES (101, ‘Manual Keyboard’, 75.00);

— Insert into a GENERATED BY DEFAULT table (can specify ID)
INSERT INTO categories (category_name)
VALUES (‘Electronics’); — ID will be generated

INSERT INTO categories (category_id, category_name)
VALUES (10, ‘Books’); — ID is overridden

INSERT INTO categories (category_name)
VALUES (‘Clothing’); — ID will be generated (likely 2, if sequence started at 1)

— Check results
SELECT * FROM products ORDER BY product_id;
— product_id | product_name | price
— ————+——————+———
— 1 | Laptop Pro | 1299.99
— 2 | Wireless Mouse | 25.50
— 101 | Manual Keyboard | 75.00

SELECT * FROM categories ORDER BY category_id;
— category_id | category_name
— ————-+—————
— 1 | Electronics
— 2 | Clothing
— 10 | Books

— Check the sequence state for categories after manual insert
— The sequence is likely still at 2, so the next insert will generate 3
SELECT last_value FROM categories_category_id_seq; — Name might differ slightly
— last_value


— 2
“`

5. SERIAL vs. IDENTITY: A Detailed Comparison

Let’s summarize the key differences in a table and then elaborate:

Feature SERIAL Pseudo-Type IDENTITY Column (SQL Standard)
SQL Standard? No (PostgreSQL Extension) Yes (SQL:2003 and later)
PostgreSQL Version All versions Version 10+
Syntax SERIAL, BIGSERIAL, SMALLSERIAL INT/BIGINT/SMALLINT GENERATED {ALWAYS|BY DEFAULT} AS IDENTITY
Underlying Mechanism Creates Sequence + Sets Column Default Creates Sequence + Associates via IDENTITY property
INSERT Behavior Implicitly like GENERATED BY DEFAULT (can override value) Explicit: ALWAYS (error on override) or BY DEFAULT (allows override)
Preventing Override No built-in way (requires triggers/rules) Yes, using GENERATED ALWAYS
Sequence Options Set via separate ALTER SEQUENCE Can be set within CREATE/ALTER TABLE
Clarity/Explicitness Less explicit (hides sequence details) More explicit (clear behavior, integrated options)
Ownership/Cleanup Sequence “owned by” column (auto-drop) Sequence internally linked/owned (auto-drop)
Permissions Relies on implicit sequence USAGE grant Similar, relies on sequence USAGE, potentially clearer association
Recommendation Legacy systems, pre-PG10 compatibility Preferred for new projects on PG10+

Elaboration:

  • Standard Compliance: This is the most significant philosophical difference. Using IDENTITY aligns your schema with the broader SQL world, potentially easing migration or integration with other systems and making your DDL understandable to developers familiar with standard SQL.
  • Explicitness and Control: IDENTITY wins here. GENERATED ALWAYS provides a crucial feature missing from SERIAL: the ability to enforce that the database must generate the ID, preventing accidental or malicious manual insertions that could break sequence continuity. The ability to define sequence options inline also makes the table definition more self-contained and readable.
  • Syntax: SERIAL is undeniably more concise. However, the clarity gained from the explicit IDENTITY syntax often outweighs the brevity.
  • Behavior: While SERIAL acts like GENERATED BY DEFAULT, the lack of a GENERATED ALWAYS equivalent is a limitation. IDENTITY gives you the choice.
  • Underlying Implementation: Both use sequences, but the way they link the sequence is different. SERIAL uses the column’s DEFAULT clause (DEFAULT nextval(...)). IDENTITY uses a specific internal IDENTITY property linking the column and sequence. This distinction is mostly internal but reflects the more integrated nature of IDENTITY.
  • Permissions: In practice, permissions work similarly for both as they both rely on the underlying sequence object. The user inserting data needs INSERT permission on the table and USAGE permission on the sequence. Default grants usually handle this, but the explicit nature of IDENTITY might make permission management slightly more intuitive in complex scenarios.

6. Choosing Between SERIAL and IDENTITY

The decision is generally straightforward:

  1. New Projects on PostgreSQL 10+: Use IDENTITY columns. They are the SQL standard, more explicit, offer better control (GENERATED ALWAYS), and represent the modern approach. Prefer GENERATED ALWAYS AS IDENTITY unless you have a specific, well-justified reason to allow manual overrides (in which case use GENERATED BY DEFAULT).
  2. Existing Projects Using SERIAL: You have two options:
    • Leave them as SERIAL: If the application works and there’s no compelling reason (like strict standard compliance requirements) to change, leaving existing SERIAL columns is perfectly acceptable. They are well-supported and functional.
    • Migrate to IDENTITY: If you are undertaking schema improvements, aiming for standard compliance, or want the benefits of GENERATED ALWAYS, you can migrate SERIAL columns to IDENTITY. We’ll cover how to do this later.
  3. Projects Requiring Compatibility with PostgreSQL < 10: You must use SERIAL, as IDENTITY columns are not available in older versions.

In essence, IDENTITY is the future-proof, standard-compliant choice, while SERIAL remains a functional and widely used legacy option.

7. Working with Auto-Incrementing Columns: Practical Considerations

Regardless of whether you use SERIAL or IDENTITY, several practical aspects arise when working with auto-generated keys.

Retrieving Generated IDs After INSERT

It’s extremely common to need the ID of the row you just inserted (e.g., to store it, use it in subsequent operations, or return it to the client). There are several ways to do this, with varying degrees of safety and efficiency:

  1. RETURNING Clause (Highly Recommended): This is the most reliable, efficient, and race-condition-free method. Add RETURNING column_name (or RETURNING *) to your INSERT statement. PostgreSQL executes the INSERT and returns the specified column(s) from the newly inserted row within the same atomic operation.

    sql
    -- For SERIAL or IDENTITY
    INSERT INTO users (username, email)
    VALUES ('emily', '[email protected]')
    RETURNING user_id;
    -- Returns the newly generated user_id directly

  2. currval('sequence_name') (Use with Caution): This function returns the value most recently obtained by nextval() for the current session.

    • Requires nextval(): You must have called nextval() earlier in the same session. It doesn’t work if the ID was generated in a different session or if you haven’t called nextval() yet.
    • Requires Knowing Sequence Name: You need to know the exact name of the sequence associated with the column (e.g., users_user_id_seq or the IDENTITY sequence name). This couples your application logic to the auto-generated sequence name, which is less ideal.
    • Less Direct: It’s an extra query compared to RETURNING.

    sql
    -- Less recommended than RETURNING
    INSERT INTO users (username, email) VALUES ('frank', '[email protected]');
    SELECT currval('users_user_id_seq'); -- Assuming users_user_id_seq is the correct name

  3. lastval() (Avoid in Concurrent Environments): This function returns the value most recently returned by any nextval() call in the current session, regardless of the sequence.

    • Highly Prone to Errors: If your transaction or session involves multiple inserts into different tables with sequences, or calls nextval() directly for other purposes between the target INSERT and the lastval() call, you will get the wrong ID.
    • Concurrency Issues: It doesn’t provide a guarantee that the value belongs to your specific insert if other operations happen concurrently within the same session involving sequences. Avoid using lastval() to retrieve primary key IDs.

Conclusion: Always prefer the RETURNING clause for retrieving auto-generated IDs.

Handling Gaps in Sequences

Users often notice that the generated IDs in their tables are not perfectly sequential (e.g., 1, 2, 5, 6, 10…). This is normal and expected behavior with database sequences and generally not a problem.

Reasons for gaps include:

  • Rolled-Back Transactions: A transaction starts, requests nextval(), but then rolls back. The sequence value is consumed and not reused.
  • Deleted Rows: Rows are inserted and later deleted, leaving gaps in the existing IDs.
  • Sequence Caching (CACHE > 1): If a sequence has a cache value greater than 1, the backend process reserves a batch of numbers. If that backend disconnects or the server restarts before using all cached numbers, that batch is lost, creating a potentially large gap.
  • GENERATED BY DEFAULT / SERIAL Overrides: Manually inserting a higher value can cause the sequence to jump when it naturally reaches that point, or simply create a gap if the sequence hasn’t caught up yet.
  • Replication Setups: Certain older replication methods could sometimes lead to sequence inconsistencies (though modern logical replication handles this much better).

Key Takeaway: Do not rely on auto-incrementing primary keys being gapless. They guarantee uniqueness, not sequentiality. If you need truly gapless sequences (e.g., for legal document numbering), auto-incrementing columns are often not the right tool; you might need application-level logic with specific locking or specialized data types/extensions.

Resetting Sequences

Sometimes, especially during development or after bulk data loads/deletions, you might want to reset a sequence counter.

  • Find the Sequence Name:

    • For SERIAL: Typically <tablename>_<colname>_seq. Use \d tablename to confirm.
    • For IDENTITY: Use \d tablename to see the sequence name listed under “Identity columns” or query the system catalogs (pg_depend, pg_sequence). A helper function like pg_get_serial_sequence('tablename', 'colname') works for both SERIAL and IDENTITY in modern PostgreSQL.
  • Reset the Value: Use ALTER SEQUENCE.

    “`sql
    — Find the max current ID in the table
    SELECT max(user_id) FROM users;

    — Get the sequence name associated with the column
    SELECT pg_get_serial_sequence(‘users’, ‘user_id’);
    — Example output: ‘public.users_user_id_seq’

    — Reset the sequence to restart after the current max value
    — Replace ‘public.users_user_id_seq’ with the actual sequence name
    — Replace ‘max_id_plus_1’ with the value from (SELECT max(user_id) FROM users) + 1
    — Ensure the user running this has permission to alter the sequence.
    ALTER SEQUENCE public.users_user_id_seq RESTART WITH max_id_plus_1;

    — Or, more dynamically (ensure no inserts happen between MAX and ALTER):
    SELECT setval(pg_get_serial_sequence(‘users’, ‘user_id’), COALESCE(max(user_id), 1), max(user_id) IS NOT NULL) FROM users;
    — COALESCE handles the case where the table is empty.
    — The third argument ‘true’ means the next nextval() will return value+1.
    — The third argument ‘false’ means the next nextval() will return the value itself. We usually want ‘true’.
    “`

Caution: Resetting sequences on live production tables should be done with extreme care. It often requires locking the table or careful coordination to avoid race conditions where new rows are inserted between calculating the max ID and resetting the sequence, which could lead to duplicate key errors later.

Permissions

As mentioned earlier, inserting into a table with an auto-incrementing column requires:

  1. INSERT permission on the table.
  2. USAGE permission on the underlying sequence.

Usually, the table owner automatically has these permissions. If other roles need to insert data, ensure they have GRANT INSERT ON TABLE tablename TO role_name; and GRANT USAGE ON SEQUENCE sequence_name TO role_name;. Granting INSERT on the table often implicitly allows usage of defaults (including sequence nextval() calls set by SERIAL or IDENTITY), but explicitly granting USAGE on the sequence is clearer and safer, especially in complex role setups or if defaults are modified.

8. Migrating from SERIAL to IDENTITY

If you decide to modernize an existing schema using SERIAL columns (and are on PostgreSQL 10+), you can migrate them to IDENTITY. The process involves altering the column to remove the old default, adding the IDENTITY property, and ensuring the existing sequence is correctly associated and owned.

Steps:

  1. Identify Target: Choose the table and SERIAL column (tablename, colname).
  2. Find Sequence Name: Determine the name of the sequence associated with the SERIAL column.
    sql
    SELECT pg_get_serial_sequence('your_table_name', 'your_serial_column_name');
    -- e.g., returns 'public.your_table_name_your_serial_column_name_seq'
  3. Alter Column – Remove Default: Remove the old nextval() default.
    sql
    ALTER TABLE your_table_name ALTER COLUMN your_serial_column_name DROP DEFAULT;
  4. Alter Column – Add Identity: Add the IDENTITY property, explicitly associating the existing sequence. Choose GENERATED ALWAYS or GENERATED BY DEFAULT.
    “`sql
    — Using GENERATED ALWAYS
    ALTER TABLE your_table_name
    ALTER COLUMN your_serial_column_name
    ADD GENERATED ALWAYS AS IDENTITY (SEQUENCE NAME public.your_table_name_your_serial_column_name_seq);

    — Or using GENERATED BY DEFAULT
    — ALTER TABLE your_table_name
    — ALTER COLUMN your_serial_column_name
    — ADD GENERATED BY DEFAULT AS IDENTITY (SEQUENCE NAME public.your_table_name_your_serial_column_name_seq);
    ``
    *Note:* By specifying
    SEQUENCE NAME`, you tell PostgreSQL to reuse the existing sequence rather than creating a new one. This preserves the current sequence value.

  5. Optional but Recommended – Sequence Ownership: While the IDENTITY association implies a link, explicitly setting the sequence ownership to the identity column makes the dependency clear in system catalogs, mirroring how a newly created IDENTITY column behaves.
    “`sql
    — Get the current sequence details (check Owner if needed)
    \d public.your_table_name_your_serial_column_name_seq

    — Alter sequence ownership (may require sequence owner privileges)
    ALTER SEQUENCE public.your_table_name_your_serial_column_name_seq OWNED BY your_table_name.your_serial_column_name;
    ``
    6. **Verify:** Use
    \d your_table_nameto confirm the column now shows asgenerated always as identity(orby default`) and lists the correct sequence details under “Identity columns”. Test insertions to ensure IDs are generated correctly.

Important Considerations:

  • Permissions: Ensure the user performing these ALTER commands has the necessary privileges on the table, column, and sequence.
  • Locking: ALTER TABLE commands acquire locks. Performing this on a very busy table might require a maintenance window or strategies to minimize locking duration (though these specific alterations are usually fast).
  • Transaction: Perform these steps within a single transaction to ensure atomicity.

Example Migration (users.user_id):

“`sql
BEGIN;

— 1. Identify: Table ‘users’, Column ‘user_id’ (assuming it’s SERIAL)

— 2. Find Sequence Name (Assume it’s ‘public.users_user_id_seq’)
— SELECT pg_get_serial_sequence(‘users’, ‘user_id’);

— 3. Remove Default
ALTER TABLE users ALTER COLUMN user_id DROP DEFAULT;

— 4. Add Identity (using GENERATED ALWAYS)
ALTER TABLE users
ALTER COLUMN user_id
ADD GENERATED ALWAYS AS IDENTITY (SEQUENCE NAME public.users_user_id_seq);

— 5. Set Ownership
ALTER SEQUENCE public.users_user_id_seq OWNED BY users.user_id;

COMMIT;

— 6. Verify
\d users
“`

9. Advanced Considerations and Alternatives

Replication

In replication scenarios (especially older physical or statement-based replication), ensuring sequences generate unique values across primary and replica nodes required careful configuration. Modern logical replication in PostgreSQL generally handles sequences correctly, but it’s always wise to test thoroughly in your specific replication setup. Sequence caching can sometimes interact unexpectedly with failover scenarios if not managed carefully.

Bulk Loading (COPY Command)

When using the COPY command to bulk-load data into a table with an auto-incrementing column:

  • SERIAL / GENERATED BY DEFAULT: If your input data includes values for the ID column, COPY will attempt to insert them (similar to an INSERT statement specifying the ID). If your input data omits the ID column, the default (nextval() or IDENTITY generation) will trigger for each row, which can be slower than providing IDs.
  • GENERATED ALWAYS: COPY will fail if the input data tries to provide a value for the GENERATED ALWAYS column, unless you specify OVERRIDING SYSTEM VALUE as an option to the COPY command (PostgreSQL 11+). If the column is omitted in the COPY column list, values will be generated.
  • Sequence Synchronization: After a large COPY operation that did include explicit IDs (for SERIAL or GENERATED BY DEFAULT, or GENERATED ALWAYS with OVERRIDING SYSTEM VALUE), the table’s maximum ID might be far beyond the sequence’s current value. You’ll likely need to reset the sequence using setval() (as shown previously) to avoid future duplicate key errors when regular INSERTs resume.

Partitioning

When using PostgreSQL’s declarative partitioning, the auto-incrementing column (often the primary key) is typically defined on the parent partitioned table. PostgreSQL automatically ensures that the sequence generation works correctly across partitions; rows inserted into the parent table (which get routed to the correct partition) will draw unique values from the single sequence defined on the parent.

Alternatives: UUIDs

While sequential integers are common, UUIDs (Universally Unique Identifiers) offer another approach to generating unique surrogate keys.

  • Pros:
    • Globally unique (extremely low collision probability across tables, databases, and even systems).
    • Can be generated client-side, potentially simplifying certain distributed workflows.
    • No central sequence bottleneck (though generating them still takes CPU time).
    • Hides information about the number of rows or insertion order.
  • Cons:
    • Larger (typically 16 bytes vs. 4 or 8 for integers), increasing storage and index size.
    • Can lead to index fragmentation and poorer cache locality, especially with completely random UUID versions (like v4), impacting performance. Sequential UUID types (like proposed v7) mitigate this.
    • Less human-readable/memorable than integers.
    • SERIAL/IDENTITY are often simpler for basic auto-incrementing needs.

PostgreSQL has excellent support for the UUID data type and functions like gen_random_uuid() (requires pgcrypto extension before PG13, built-in afterwards) or uuid_generate_v4() (from the uuid-ossp extension). You can set a default:

“`sql
— Requires uuid-ossp extension or PG13+ for gen_random_uuid()
CREATE EXTENSION IF NOT EXISTS “uuid-ossp”;

CREATE TABLE items (
item_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
item_name TEXT NOT NULL
);

INSERT INTO items (item_name) VALUES (‘Thing 1’);
SELECT * FROM items;
— item_id | item_name
— ————————————–+———–
— a1b2c3d4-e5f6-7890-1234-567890abcdef | Thing 1 (Example UUID)
“`

Choosing between sequential integers and UUIDs depends on specific application requirements regarding uniqueness scope, performance characteristics, key size, and generation strategy.

10. Conclusion

Auto-incrementing primary keys are a cornerstone of relational database design, providing simple, reliable, and efficient row identification. PostgreSQL offers two excellent mechanisms for this: the traditional SERIAL pseudo-types and the modern, SQL-standard IDENTITY columns.

  • SERIAL (SMALLSERIAL, SERIAL, BIGSERIAL) is the long-standing PostgreSQL method. It’s concise syntactic sugar that creates an integer column, an associated sequence, sets the default to nextval(), adds NOT NULL, and links the sequence ownership. It behaves like GENERATED BY DEFAULT, allowing overrides during INSERT.
  • IDENTITY (GENERATED ALWAYS AS IDENTITY, GENERATED BY DEFAULT AS IDENTITY) is the SQL standard approach, available since PostgreSQL 10. It offers clearer syntax, explicitly defines insert override behavior (ALWAYS prevents overrides, BY DEFAULT allows them), allows inline sequence option configuration, and is the recommended choice for new development.

Both mechanisms rely fundamentally on Sequences, independent database objects that generate ordered numbers. Understanding sequences helps in comprehending the behavior, customization, and potential nuances (like gaps) of auto-incrementing columns.

For developers working with PostgreSQL:

  • Prefer IDENTITY columns (specifically GENERATED ALWAYS) for new tables if using PostgreSQL 10 or later.
  • Understand that SERIAL is perfectly functional and acceptable in existing systems or when pre-PG10 compatibility is needed.
  • Always use the RETURNING clause to reliably retrieve the generated ID after an INSERT.
  • Accept that gaps in generated IDs are normal and usually not indicative of a problem.
  • Know how to inspect sequence properties and reset them carefully if necessary (e.g., after bulk loads).
  • Consider UUIDs as an alternative if global uniqueness or client-side generation is required, but be mindful of the performance and storage trade-offs.

By mastering SERIAL and IDENTITY, you gain essential tools for building robust, well-structured, and efficient PostgreSQL databases. Choosing the right tool and understanding its behavior ensures data integrity and simplifies application development.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top