PostgreSQL SERIAL and IDENTITY Columns: A Comprehensive Introduction
In the world of relational databases, ensuring the uniqueness and integrity of data is paramount. One of the most fundamental concepts in achieving this is the primary key – a column or set of columns that uniquely identifies each row within a table. While primary keys can be derived from natural data attributes (like a user’s email address or a product’s SKU), it’s often more robust, efficient, and simpler to use surrogate keys. These are artificial keys, typically sequential integers, generated by the database system itself, with no inherent meaning other than uniquely identifying a row.
PostgreSQL, a powerful and feature-rich open-source object-relational database system, provides two primary mechanisms for automatically generating these sequential surrogate keys: the traditional SERIAL
pseudo-types and the more modern, SQL standard IDENTITY
columns.
This article provides a deep dive into both SERIAL
and IDENTITY
columns in PostgreSQL. We will explore:
- The fundamental need for auto-incrementing keys.
- The underlying mechanism powering both: PostgreSQL Sequences.
- A detailed look at the
SERIAL
pseudo-type: its variants, usage, advantages, and disadvantages. - A detailed look at the SQL standard
IDENTITY
columns:GENERATED ALWAYS
andGENERATED BY DEFAULT
, syntax, benefits, and drawbacks. - A thorough comparison between
SERIAL
andIDENTITY
. - Guidance on choosing the right approach for your needs.
- Practical considerations like retrieving generated IDs, handling gaps, resetting sequences, and permissions.
- How to migrate existing
SERIAL
columns toIDENTITY
. - Advanced considerations and alternatives.
By the end of this article, you will have a comprehensive understanding of how PostgreSQL handles auto-incrementing columns, enabling you to make informed decisions when designing your database schemas.
1. The Need for Auto-Incrementing Keys (Surrogate Keys)
Before diving into the specifics of SERIAL
and IDENTITY
, let’s solidify why auto-incrementing keys are so widely used.
- Guaranteed Uniqueness: The primary purpose of a primary key is to uniquely identify each row. Manually assigning unique IDs is prone to errors, collisions (attempting to assign the same ID twice), and inefficiencies. Auto-incrementing mechanisms ensure that each new row automatically receives a unique identifier within the table.
- Simplicity: Using a simple integer as a key is straightforward. It’s easy to understand, query, and use in foreign key relationships in other tables.
- Stability: Natural keys (like email addresses or usernames) can sometimes change. If an email address used as a primary key changes, it requires updating not only the primary key column but also all foreign key references in other tables, which can be complex and error-prone. Surrogate keys, being artificial, rarely need to change once assigned.
- Performance: Integer keys are generally compact and efficient for indexing and joining operations compared to potentially long string-based natural keys. Comparisons and lookups are faster.
- Anonymity/Abstraction: Surrogate keys decouple the row’s identity from its potentially sensitive or mutable real-world attributes.
While auto-incrementing integers are the most common form of surrogate keys, other strategies like using UUIDs (Universally Unique Identifiers) also exist. However, for many common use cases, sequential integers provide a good balance of simplicity, performance, and guaranteed uniqueness within the scope of a single table. PostgreSQL’s SERIAL
and IDENTITY
are designed precisely for generating these sequential integer keys.
2. The Foundation: Understanding PostgreSQL Sequences
Both SERIAL
and IDENTITY
columns in PostgreSQL are built upon a fundamental database object: the Sequence. A sequence is a special kind of database object that generates a sequence of integers according to specified rules. It’s essentially a sophisticated counter maintained by the database server.
Understanding sequences is crucial because they are the engine powering auto-incrementing behavior. Even if you primarily use the higher-level SERIAL
or IDENTITY
syntax, knowing about sequences helps in troubleshooting, customization, and understanding the underlying mechanics.
Key Concepts of Sequences:
- Independent Objects: Sequences exist independently of tables. They have their own names, owners, and permissions.
- Generation: They generate unique values across concurrent transactions. PostgreSQL handles the necessary locking to ensure that multiple sessions requesting a number from the same sequence will each receive distinct values.
- Non-Transactional (Typically): The operation of fetching the next value from a sequence (
nextval()
) is generally non-transactional. This means that if a transaction requests a sequence value but later rolls back, the sequence value is consumed and will not be reused. This is a primary reason why gaps can appear in auto-incremented sequences. - Customizable: Sequences offer various options for customization:
INCREMENT BY
: The value to add to the current sequence value to get the next one (default is 1). Can be negative.MINVALUE
: The minimum value the sequence can generate.MAXVALUE
: The maximum value the sequence can generate.START WITH
: The initial value the sequence will return.CACHE
: How many sequence numbers to pre-allocate and store in memory for faster access. This can significantly improve performance in high-throughput systems but increases the likelihood of larger gaps if the server restarts or a session consuming cached values terminates unexpectedly. Default is typically 1.CYCLE
: Allows the sequence to wrap around and restart fromMINVALUE
(orMAXVALUE
for descending sequences) once it reaches its limit. This is generally not desirable for primary keys, as it violates uniqueness.
Creating and Using Sequences Manually:
You can create and interact with sequences directly using SQL commands:
“`sql
— Create a sequence for tracking order numbers
CREATE SEQUENCE public.order_number_seq
INCREMENT BY 1
MINVALUE 1
MAXVALUE 999999
START WITH 1000
CACHE 10; — Pre-allocate 10 numbers
— Grant usage permission (necessary if the user creating the table is different)
— GRANT USAGE, SELECT ON SEQUENCE public.order_number_seq TO web_user;
— Get the next value from the sequence
SELECT nextval(‘public.order_number_seq’);
— Get the most recent value returned by nextval() for this session
SELECT currval(‘public.order_number_seq’);
— Note: currval() requires nextval() to have been called previously in the same session.
— Set the sequence’s current value (use with caution!)
SELECT setval(‘public.order_number_seq’, 5000); — Next nextval() will return 5001
SELECT setval(‘public.order_number_seq’, 5000, false); — Next nextval() will return 5000
— Alter sequence properties
ALTER SEQUENCE public.order_number_seq RESTART WITH 2000;
ALTER SEQUENCE public.order_number_seq INCREMENT BY 10;
— Drop a sequence
DROP SEQUENCE public.order_number_seq;
“`
With this understanding of sequences as the underlying mechanism, let’s explore how SERIAL
and IDENTITY
provide convenient ways to use them for auto-incrementing table columns.
3. The SERIAL
Pseudo-Type: The Traditional Approach
For many years, SERIAL
was the standard way to create auto-incrementing integer columns in PostgreSQL. It’s important to note that SERIAL
, SMALLSERIAL
, and BIGSERIAL
are not true data types in PostgreSQL. They are shorthand notations, or “syntactic sugar,” that instruct PostgreSQL to perform a set of actions during table creation.
Variants:
PostgreSQL offers three SERIAL
pseudo-types, corresponding to different integer sizes:
SMALLSERIAL
: Creates aSMALLINT
column (2 bytes, range -32,768 to +32,767). Suitable for tables where the number of rows is guaranteed to be relatively small.SERIAL
: Creates anINTEGER
column (4 bytes, range -2,147,483,648 to +2,147,483,647). The most common choice, suitable for many applications.BIGSERIAL
: Creates aBIGINT
column (8 bytes, range -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807). Necessary for tables that might potentially store billions of rows.
Syntax and “Under the Hood”:
When you define a column using one of the SERIAL
types, PostgreSQL automatically performs these actions:
- Creates a Sequence: It creates an independent sequence object. The sequence name is automatically generated based on the table and column name, typically following the pattern
<tablename>_<columnname>_seq
. - Creates the Column: It creates the table column with the corresponding integer type (
SMALLINT
,INTEGER
, orBIGINT
). - Sets Default Value: It sets the default value for the column to the next value generated by the newly created sequence (
nextval('<tablename>_<columnname>_seq'::regclass)
). - Adds
NOT NULL
Constraint: It adds aNOT NULL
constraint to the column, as primary keys cannot be null. - Sets Sequence Ownership: It marks the sequence as “owned by” the column. This is a crucial detail: it means that if the column or the table is dropped, the associated sequence will also be automatically dropped.
Example:
“`sql
— Using SERIAL
CREATE TABLE users (
user_id SERIAL PRIMARY KEY, — Creates INTEGER column + sequence + default + NOT NULL
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
— Using BIGSERIAL for potentially very large tables
CREATE TABLE logs (
log_id BIGSERIAL PRIMARY KEY, — Creates BIGINT column + sequence + default + NOT NULL
log_message TEXT,
log_timestamp TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
“`
Let’s examine what PostgreSQL actually creates for the users
table:
“`sql
— Show table definition (output might vary slightly based on PG version)
\d users
— Expected approximate output:
— Table “public.users”
— Column | Type | Collation | Nullable | Default
— ————+————————–+———–+———-+——————————————
— user_id | integer | | not null | nextval(‘users_user_id_seq’::regclass)
— username | character varying(50) | | not null |
— email | character varying(100) | | |
— created_at | timestamp with time zone | | | CURRENT_TIMESTAMP
— Indexes:
— “users_pkey” PRIMARY KEY, btree (user_id)
— “users_username_key” UNIQUE CONSTRAINT, btree (username)
— “users_email_key” UNIQUE CONSTRAINT, btree (email)
— Show the associated sequence definition
\d users_user_id_seq
— Expected approximate output:
— Sequence “public.users_user_id_seq”
— Type | Start | Minimum | Maximum | Increment | Cycles? | Cache
— ———-+——-+———+————+———–+———+——-
— integer | 1 | 1 | 2147483647 | 1 | no | 1
— Owned by: public.users.user_id
“`
As you can see, SERIAL
is a convenient shorthand for setting up the sequence, column type, default value, NOT NULL
constraint, and ownership link.
Advantages of SERIAL
:
- Simplicity: The syntax is very concise and easy to remember.
- Historical Prevalence: It has been the standard method in PostgreSQL for a long time, so it’s widely understood and found in much existing code, documentation, and tutorials.
- Automatic Cleanup: The ownership link ensures the sequence is automatically dropped when the column or table is dropped, preventing orphaned sequences.
Disadvantages of SERIAL
:
- Non-Standard:
SERIAL
is a PostgreSQL-specific extension. It is not part of the SQL standard. This can be a drawback if you aim for database portability or adherence to standards. - Implicit Behavior: The underlying mechanics (sequence creation, default setting) are somewhat hidden behind the pseudo-type, which can sometimes be less clear than explicitly defining behavior.
- Permissions Complexity (Minor): The ownership model can sometimes interact confusingly with permissions. While
USAGE
permission on the sequence is automatically granted to the table owner, finer-grained permission management might require interacting directly with the sequence object. For example, if a different user needs to insert into the table, they implicitly needUSAGE
permission on the sequence. While often handled correctly by default role memberships, it can be a point of confusion. - Inflexible Insert Behavior: By default,
SERIAL
columns behave likeGENERATED BY DEFAULT
(which we’ll discuss next) – you can override the sequence by providing your own value duringINSERT
, but this is often discouraged as it can lead to duplicate key violations if the sequence later catches up to the manually inserted value. There’s no built-in way withSERIAL
syntax alone to prevent overriding the sequence value.
Practical Example: Inserting Data
“`sql
— Insert data, letting the SERIAL column generate the ID
INSERT INTO users (username, email)
VALUES (‘alice’, ‘[email protected]’);
INSERT INTO users (username, email)
VALUES (‘bob’, ‘[email protected]’);
— Retrieve the generated ID using the RETURNING clause (Recommended)
INSERT INTO users (username, email)
VALUES (‘charlie’, ‘[email protected]’)
RETURNING user_id;
— Check the table contents
SELECT * FROM users;
— user_id | username | email | created_at
— ———+———-+——————–+——————————-
— 1 | alice | [email protected] | 2023-10-27 10:00:00.123456+00
— 2 | bob | [email protected] | 2023-10-27 10:00:05.789012+00
— 3 | charlie | [email protected]| 2023-10-27 10:00:10.345678+00
— It’s possible (but usually discouraged) to override the SERIAL value
— Note: This might cause a duplicate key error if ‘4’ is inserted later by the sequence.
INSERT INTO users (user_id, username, email)
VALUES (100, ‘manual_user’, ‘[email protected]’);
— See the current sequence value (will be 3 after inserting charlie)
SELECT last_value FROM users_user_id_seq;
— last_value
— 3
— If we insert again, it continues from where the sequence left off
INSERT INTO users (username, email)
VALUES (‘david’, ‘[email protected]’);
SELECT * FROM users WHERE username = ‘david’;
— user_id | username | email | created_at
— ———+———-+——————–+——————————-
— 4 | david | [email protected] | 2023-10-27 10:05:00.987654+00
“`
4. The IDENTITY
Column: The SQL Standard Approach
Introduced in PostgreSQL version 10, IDENTITY
columns are the SQL standard way of defining auto-incrementing columns. They provide a more explicit, standard-compliant, and arguably more robust way to achieve the same goal as SERIAL
.
Like SERIAL
, IDENTITY
columns also rely on sequences internally, but the relationship and syntax are defined according to the SQL standard.
Syntax and Variants:
The IDENTITY
clause is added directly to the column definition. It comes in two main flavors, controlling how the database handles explicit value insertion attempts:
-
GENERATED ALWAYS AS IDENTITY
:- This is the stricter, often preferred option.
- The database always generates a value for this column using the associated sequence.
- Attempting to specify a value for this column during an
INSERT
orUPDATE
statement will result in an error by default. - It strongly enforces the idea that the database is solely responsible for generating the ID.
- You can temporarily override this behavior for specific
INSERT
statements usingOVERRIDING SYSTEM VALUE
, typically used for data loading or specific maintenance tasks, but it requires explicit intent.
-
GENERATED BY DEFAULT AS IDENTITY
:- This behaves more like the traditional
SERIAL
type. - The database generates a value by default only if one is not explicitly provided in the
INSERT
statement. - If you provide a value for the column during
INSERT
, that value will be used instead of the sequence-generated value. - This offers more flexibility but carries the same risk as
SERIAL
of potential future collisions if manually inserted values conflict with sequence-generated values.
- This behaves more like the traditional
Example:
“`sql
— Using GENERATED ALWAYS AS IDENTITY (Recommended for most new designs)
CREATE TABLE products (
product_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, — Stricter
product_name VARCHAR(100) NOT NULL,
price NUMERIC(10, 2)
);
— Using GENERATED BY DEFAULT AS IDENTITY (Similar behavior to SERIAL)
CREATE TABLE categories (
category_id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, — More flexible insert
category_name VARCHAR(50) NOT NULL UNIQUE
);
— Using BIGINT with IDENTITY
CREATE TABLE event_log (
event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_type VARCHAR(30),
event_data JSONB,
logged_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
“`
Under the Hood:
Defining a column with IDENTITY
also involves sequences, but the relationship is managed slightly differently compared to SERIAL
:
- Creates a Sequence: PostgreSQL creates an internal sequence associated with the column. The naming convention might differ, often incorporating
identity
. - Creates the Column: It creates the table column with the specified data type (e.g.,
INT
,BIGINT
). TheNOT NULL
constraint is usually implied byPRIMARY KEY
or can be added explicitly. - Associates Sequence: It links the sequence to the column via the
IDENTITY
property, setting the generation mechanism (ALWAYS
orBY DEFAULT
). This linkage is tighter thanSERIAL
‘s default value approach. - Sequence Ownership: The sequence is internally marked as owned by the identity column, ensuring it’s dropped if the column or table is dropped, similar to
SERIAL
.
You can inspect the definition:
“`sql
\d products
— Expected approximate output:
— Table “public.products”
— Column | Type | Collation | Nullable | Default
— ————–+————————+———–+———-+——————————————
— product_id | integer | | not null | generated always as identity
— product_name | character varying(100) | | not null |
— price | numeric(10,2) | | |
— Indexes:
— “products_pkey” PRIMARY KEY, btree (product_id)
— Identity columns:
— “product_id” integer GENERATED ALWAYS AS IDENTITY ( INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483647 START WITH 1 CACHE 1 )
\d categories
— Expected approximate output:
— Table “public.categories”
— Column | Type | Collation | Nullable | Default
— —————-+———————–+———–+———-+——————————————–
— category_id | integer | | not null | generated by default as identity
— category_name | character varying(50) | | not null |
— Indexes:
— “categories_pkey” PRIMARY KEY, btree (category_id)
— “categories_category_name_key” UNIQUE CONSTRAINT, btree (category_name)
— Identity columns:
— “category_id” integer GENERATED BY DEFAULT AS IDENTITY ( INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483647 START WITH 1 CACHE 1 )
“`
Notice how the Default
column now explicitly states generated always as identity
or generated by default as identity
, and there’s a separate section Identity columns
detailing the sequence parameters.
Customizing Sequence Options:
A significant advantage of IDENTITY
is the ability to specify sequence options directly within the CREATE TABLE
or ALTER TABLE
statement, making the configuration more explicit and self-contained:
sql
CREATE TABLE orders (
order_id BIGINT GENERATED ALWAYS AS IDENTITY (
START WITH 1000
INCREMENT BY 5
MINVALUE 1000
MAXVALUE 1000000000
CACHE 20
-- CYCLE -- Generally avoid CYCLE for primary keys
) PRIMARY KEY,
customer_id INT NOT NULL, -- Assuming a foreign key to a customers table
order_date DATE DEFAULT CURRENT_DATE
);
This clearly defines the behavior of the underlying sequence without needing separate CREATE SEQUENCE
or ALTER SEQUENCE
commands.
Advantages of IDENTITY
:
- SQL Standard Compliance: This is the primary advantage. Using
IDENTITY
makes your schema more portable to other SQL-compliant database systems and aligns with modern SQL practices. - Explicit Syntax: The
GENERATED ALWAYS
andGENERATED BY DEFAULT
keywords clearly document the intended behavior regardingINSERT
overrides.GENERATED ALWAYS
provides stronger protection against accidental manual insertions. - Clearer Definition: The definition is arguably cleaner, separating the data type (
INT
,BIGINT
) from the generation mechanism (IDENTITY
). - Integrated Sequence Options: Sequence parameters can be defined directly within the column definition, improving readability and maintainability.
- Generally Preferred: It’s the recommended approach for new applications using PostgreSQL 10 and later.
Disadvantages of IDENTITY
:
- PostgreSQL Version: Requires PostgreSQL version 10 or newer. If you need compatibility with older versions,
SERIAL
is the only option. - Slightly More Verbose: The syntax (
GENERATED ALWAYS AS IDENTITY
) is longer thanSERIAL
. - Potential Confusion: Users migrating from systems with different
IDENTITY
implementations (like SQL Server) might need to adjust to theALWAYS
vs.BY DEFAULT
distinction and PostgreSQL’s specific syntax.
Practical Example: Inserting Data
“`sql
— Insert into a GENERATED ALWAYS table (cannot specify ID)
INSERT INTO products (product_name, price)
VALUES (‘Laptop Pro’, 1299.99);
INSERT INTO products (product_name, price)
VALUES (‘Wireless Mouse’, 25.50)
RETURNING product_id;
— Attempting to insert an ID fails (by default)
— INSERT INTO products (product_id, product_name, price) VALUES (101, ‘Manual Keyboard’, 75.00);
— ERROR: cannot insert a non-default value into column “product_id”
— DETAIL: Column “product_id” is an identity column defined as GENERATED ALWAYS.
— HINT: Use OVERRIDING SYSTEM VALUE to override.
— Override system value (use with caution, requires specific intent)
INSERT INTO products (product_id, product_name, price)
OVERRIDING SYSTEM VALUE
VALUES (101, ‘Manual Keyboard’, 75.00);
— Insert into a GENERATED BY DEFAULT table (can specify ID)
INSERT INTO categories (category_name)
VALUES (‘Electronics’); — ID will be generated
INSERT INTO categories (category_id, category_name)
VALUES (10, ‘Books’); — ID is overridden
INSERT INTO categories (category_name)
VALUES (‘Clothing’); — ID will be generated (likely 2, if sequence started at 1)
— Check results
SELECT * FROM products ORDER BY product_id;
— product_id | product_name | price
— ————+——————+———
— 1 | Laptop Pro | 1299.99
— 2 | Wireless Mouse | 25.50
— 101 | Manual Keyboard | 75.00
SELECT * FROM categories ORDER BY category_id;
— category_id | category_name
— ————-+—————
— 1 | Electronics
— 2 | Clothing
— 10 | Books
— Check the sequence state for categories after manual insert
— The sequence is likely still at 2, so the next insert will generate 3
SELECT last_value FROM categories_category_id_seq; — Name might differ slightly
— last_value
— 2
“`
5. SERIAL
vs. IDENTITY
: A Detailed Comparison
Let’s summarize the key differences in a table and then elaborate:
Feature | SERIAL Pseudo-Type |
IDENTITY Column (SQL Standard) |
---|---|---|
SQL Standard? | No (PostgreSQL Extension) | Yes (SQL:2003 and later) |
PostgreSQL Version | All versions | Version 10+ |
Syntax | SERIAL , BIGSERIAL , SMALLSERIAL |
INT/BIGINT/SMALLINT GENERATED {ALWAYS|BY DEFAULT} AS IDENTITY |
Underlying Mechanism | Creates Sequence + Sets Column Default | Creates Sequence + Associates via IDENTITY property |
INSERT Behavior |
Implicitly like GENERATED BY DEFAULT (can override value) |
Explicit: ALWAYS (error on override) or BY DEFAULT (allows override) |
Preventing Override | No built-in way (requires triggers/rules) | Yes, using GENERATED ALWAYS |
Sequence Options | Set via separate ALTER SEQUENCE |
Can be set within CREATE/ALTER TABLE |
Clarity/Explicitness | Less explicit (hides sequence details) | More explicit (clear behavior, integrated options) |
Ownership/Cleanup | Sequence “owned by” column (auto-drop) | Sequence internally linked/owned (auto-drop) |
Permissions | Relies on implicit sequence USAGE grant |
Similar, relies on sequence USAGE , potentially clearer association |
Recommendation | Legacy systems, pre-PG10 compatibility | Preferred for new projects on PG10+ |
Elaboration:
- Standard Compliance: This is the most significant philosophical difference. Using
IDENTITY
aligns your schema with the broader SQL world, potentially easing migration or integration with other systems and making your DDL understandable to developers familiar with standard SQL. - Explicitness and Control:
IDENTITY
wins here.GENERATED ALWAYS
provides a crucial feature missing fromSERIAL
: the ability to enforce that the database must generate the ID, preventing accidental or malicious manual insertions that could break sequence continuity. The ability to define sequence options inline also makes the table definition more self-contained and readable. - Syntax:
SERIAL
is undeniably more concise. However, the clarity gained from the explicitIDENTITY
syntax often outweighs the brevity. - Behavior: While
SERIAL
acts likeGENERATED BY DEFAULT
, the lack of aGENERATED ALWAYS
equivalent is a limitation.IDENTITY
gives you the choice. - Underlying Implementation: Both use sequences, but the way they link the sequence is different.
SERIAL
uses the column’sDEFAULT
clause (DEFAULT nextval(...)
).IDENTITY
uses a specific internalIDENTITY
property linking the column and sequence. This distinction is mostly internal but reflects the more integrated nature ofIDENTITY
. - Permissions: In practice, permissions work similarly for both as they both rely on the underlying sequence object. The user inserting data needs
INSERT
permission on the table andUSAGE
permission on the sequence. Default grants usually handle this, but the explicit nature ofIDENTITY
might make permission management slightly more intuitive in complex scenarios.
6. Choosing Between SERIAL
and IDENTITY
The decision is generally straightforward:
- New Projects on PostgreSQL 10+: Use
IDENTITY
columns. They are the SQL standard, more explicit, offer better control (GENERATED ALWAYS
), and represent the modern approach. PreferGENERATED ALWAYS AS IDENTITY
unless you have a specific, well-justified reason to allow manual overrides (in which case useGENERATED BY DEFAULT
). - Existing Projects Using
SERIAL
: You have two options:- Leave them as
SERIAL
: If the application works and there’s no compelling reason (like strict standard compliance requirements) to change, leaving existingSERIAL
columns is perfectly acceptable. They are well-supported and functional. - Migrate to
IDENTITY
: If you are undertaking schema improvements, aiming for standard compliance, or want the benefits ofGENERATED ALWAYS
, you can migrateSERIAL
columns toIDENTITY
. We’ll cover how to do this later.
- Leave them as
- Projects Requiring Compatibility with PostgreSQL < 10: You must use
SERIAL
, asIDENTITY
columns are not available in older versions.
In essence, IDENTITY
is the future-proof, standard-compliant choice, while SERIAL
remains a functional and widely used legacy option.
7. Working with Auto-Incrementing Columns: Practical Considerations
Regardless of whether you use SERIAL
or IDENTITY
, several practical aspects arise when working with auto-generated keys.
Retrieving Generated IDs After INSERT
It’s extremely common to need the ID of the row you just inserted (e.g., to store it, use it in subsequent operations, or return it to the client). There are several ways to do this, with varying degrees of safety and efficiency:
-
RETURNING
Clause (Highly Recommended): This is the most reliable, efficient, and race-condition-free method. AddRETURNING column_name
(orRETURNING *
) to yourINSERT
statement. PostgreSQL executes theINSERT
and returns the specified column(s) from the newly inserted row within the same atomic operation.sql
-- For SERIAL or IDENTITY
INSERT INTO users (username, email)
VALUES ('emily', '[email protected]')
RETURNING user_id;
-- Returns the newly generated user_id directly -
currval('sequence_name')
(Use with Caution): This function returns the value most recently obtained bynextval()
for the current session.- Requires
nextval()
: You must have callednextval()
earlier in the same session. It doesn’t work if the ID was generated in a different session or if you haven’t callednextval()
yet. - Requires Knowing Sequence Name: You need to know the exact name of the sequence associated with the column (e.g.,
users_user_id_seq
or theIDENTITY
sequence name). This couples your application logic to the auto-generated sequence name, which is less ideal. - Less Direct: It’s an extra query compared to
RETURNING
.
sql
-- Less recommended than RETURNING
INSERT INTO users (username, email) VALUES ('frank', '[email protected]');
SELECT currval('users_user_id_seq'); -- Assuming users_user_id_seq is the correct name - Requires
-
lastval()
(Avoid in Concurrent Environments): This function returns the value most recently returned by anynextval()
call in the current session, regardless of the sequence.- Highly Prone to Errors: If your transaction or session involves multiple inserts into different tables with sequences, or calls
nextval()
directly for other purposes between the targetINSERT
and thelastval()
call, you will get the wrong ID. - Concurrency Issues: It doesn’t provide a guarantee that the value belongs to your specific insert if other operations happen concurrently within the same session involving sequences. Avoid using
lastval()
to retrieve primary key IDs.
- Highly Prone to Errors: If your transaction or session involves multiple inserts into different tables with sequences, or calls
Conclusion: Always prefer the RETURNING
clause for retrieving auto-generated IDs.
Handling Gaps in Sequences
Users often notice that the generated IDs in their tables are not perfectly sequential (e.g., 1, 2, 5, 6, 10…). This is normal and expected behavior with database sequences and generally not a problem.
Reasons for gaps include:
- Rolled-Back Transactions: A transaction starts, requests
nextval()
, but then rolls back. The sequence value is consumed and not reused. - Deleted Rows: Rows are inserted and later deleted, leaving gaps in the existing IDs.
- Sequence Caching (
CACHE
> 1): If a sequence has a cache value greater than 1, the backend process reserves a batch of numbers. If that backend disconnects or the server restarts before using all cached numbers, that batch is lost, creating a potentially large gap. GENERATED BY DEFAULT
/SERIAL
Overrides: Manually inserting a higher value can cause the sequence to jump when it naturally reaches that point, or simply create a gap if the sequence hasn’t caught up yet.- Replication Setups: Certain older replication methods could sometimes lead to sequence inconsistencies (though modern logical replication handles this much better).
Key Takeaway: Do not rely on auto-incrementing primary keys being gapless. They guarantee uniqueness, not sequentiality. If you need truly gapless sequences (e.g., for legal document numbering), auto-incrementing columns are often not the right tool; you might need application-level logic with specific locking or specialized data types/extensions.
Resetting Sequences
Sometimes, especially during development or after bulk data loads/deletions, you might want to reset a sequence counter.
-
Find the Sequence Name:
- For
SERIAL
: Typically<tablename>_<colname>_seq
. Use\d tablename
to confirm. - For
IDENTITY
: Use\d tablename
to see the sequence name listed under “Identity columns” or query the system catalogs (pg_depend
,pg_sequence
). A helper function likepg_get_serial_sequence('tablename', 'colname')
works for bothSERIAL
andIDENTITY
in modern PostgreSQL.
- For
-
Reset the Value: Use
ALTER SEQUENCE
.“`sql
— Find the max current ID in the table
SELECT max(user_id) FROM users;— Get the sequence name associated with the column
SELECT pg_get_serial_sequence(‘users’, ‘user_id’);
— Example output: ‘public.users_user_id_seq’— Reset the sequence to restart after the current max value
— Replace ‘public.users_user_id_seq’ with the actual sequence name
— Replace ‘max_id_plus_1’ with the value from (SELECT max(user_id) FROM users) + 1
— Ensure the user running this has permission to alter the sequence.
ALTER SEQUENCE public.users_user_id_seq RESTART WITH max_id_plus_1;— Or, more dynamically (ensure no inserts happen between MAX and ALTER):
SELECT setval(pg_get_serial_sequence(‘users’, ‘user_id’), COALESCE(max(user_id), 1), max(user_id) IS NOT NULL) FROM users;
— COALESCE handles the case where the table is empty.
— The third argument ‘true’ means the next nextval() will return value+1.
— The third argument ‘false’ means the next nextval() will return the value itself. We usually want ‘true’.
“`
Caution: Resetting sequences on live production tables should be done with extreme care. It often requires locking the table or careful coordination to avoid race conditions where new rows are inserted between calculating the max ID and resetting the sequence, which could lead to duplicate key errors later.
Permissions
As mentioned earlier, inserting into a table with an auto-incrementing column requires:
INSERT
permission on the table.USAGE
permission on the underlying sequence.
Usually, the table owner automatically has these permissions. If other roles need to insert data, ensure they have GRANT INSERT ON TABLE tablename TO role_name;
and GRANT USAGE ON SEQUENCE sequence_name TO role_name;
. Granting INSERT
on the table often implicitly allows usage of defaults (including sequence nextval()
calls set by SERIAL
or IDENTITY
), but explicitly granting USAGE
on the sequence is clearer and safer, especially in complex role setups or if defaults are modified.
8. Migrating from SERIAL
to IDENTITY
If you decide to modernize an existing schema using SERIAL
columns (and are on PostgreSQL 10+), you can migrate them to IDENTITY
. The process involves altering the column to remove the old default, adding the IDENTITY
property, and ensuring the existing sequence is correctly associated and owned.
Steps:
- Identify Target: Choose the table and
SERIAL
column (tablename
,colname
). - Find Sequence Name: Determine the name of the sequence associated with the
SERIAL
column.
sql
SELECT pg_get_serial_sequence('your_table_name', 'your_serial_column_name');
-- e.g., returns 'public.your_table_name_your_serial_column_name_seq' - Alter Column – Remove Default: Remove the old
nextval()
default.
sql
ALTER TABLE your_table_name ALTER COLUMN your_serial_column_name DROP DEFAULT; -
Alter Column – Add Identity: Add the
IDENTITY
property, explicitly associating the existing sequence. ChooseGENERATED ALWAYS
orGENERATED BY DEFAULT
.
“`sql
— Using GENERATED ALWAYS
ALTER TABLE your_table_name
ALTER COLUMN your_serial_column_name
ADD GENERATED ALWAYS AS IDENTITY (SEQUENCE NAME public.your_table_name_your_serial_column_name_seq);— Or using GENERATED BY DEFAULT
— ALTER TABLE your_table_name
— ALTER COLUMN your_serial_column_name
— ADD GENERATED BY DEFAULT AS IDENTITY (SEQUENCE NAME public.your_table_name_your_serial_column_name_seq);
``
SEQUENCE NAME`, you tell PostgreSQL to reuse the existing sequence rather than creating a new one. This preserves the current sequence value.
*Note:* By specifying -
Optional but Recommended – Sequence Ownership: While the
IDENTITY
association implies a link, explicitly setting the sequence ownership to the identity column makes the dependency clear in system catalogs, mirroring how a newly createdIDENTITY
column behaves.
“`sql
— Get the current sequence details (check Owner if needed)
\d public.your_table_name_your_serial_column_name_seq— Alter sequence ownership (may require sequence owner privileges)
ALTER SEQUENCE public.your_table_name_your_serial_column_name_seq OWNED BY your_table_name.your_serial_column_name;
``
\d your_table_name
6. **Verify:** Useto confirm the column now shows as
generated always as identity(or
by default`) and lists the correct sequence details under “Identity columns”. Test insertions to ensure IDs are generated correctly.
Important Considerations:
- Permissions: Ensure the user performing these
ALTER
commands has the necessary privileges on the table, column, and sequence. - Locking:
ALTER TABLE
commands acquire locks. Performing this on a very busy table might require a maintenance window or strategies to minimize locking duration (though these specific alterations are usually fast). - Transaction: Perform these steps within a single transaction to ensure atomicity.
Example Migration (users.user_id):
“`sql
BEGIN;
— 1. Identify: Table ‘users’, Column ‘user_id’ (assuming it’s SERIAL)
— 2. Find Sequence Name (Assume it’s ‘public.users_user_id_seq’)
— SELECT pg_get_serial_sequence(‘users’, ‘user_id’);
— 3. Remove Default
ALTER TABLE users ALTER COLUMN user_id DROP DEFAULT;
— 4. Add Identity (using GENERATED ALWAYS)
ALTER TABLE users
ALTER COLUMN user_id
ADD GENERATED ALWAYS AS IDENTITY (SEQUENCE NAME public.users_user_id_seq);
— 5. Set Ownership
ALTER SEQUENCE public.users_user_id_seq OWNED BY users.user_id;
COMMIT;
— 6. Verify
\d users
“`
9. Advanced Considerations and Alternatives
Replication
In replication scenarios (especially older physical or statement-based replication), ensuring sequences generate unique values across primary and replica nodes required careful configuration. Modern logical replication in PostgreSQL generally handles sequences correctly, but it’s always wise to test thoroughly in your specific replication setup. Sequence caching can sometimes interact unexpectedly with failover scenarios if not managed carefully.
Bulk Loading (COPY
Command)
When using the COPY
command to bulk-load data into a table with an auto-incrementing column:
SERIAL
/GENERATED BY DEFAULT
: If your input data includes values for the ID column,COPY
will attempt to insert them (similar to anINSERT
statement specifying the ID). If your input data omits the ID column, the default (nextval()
orIDENTITY
generation) will trigger for each row, which can be slower than providing IDs.GENERATED ALWAYS
:COPY
will fail if the input data tries to provide a value for theGENERATED ALWAYS
column, unless you specifyOVERRIDING SYSTEM VALUE
as an option to theCOPY
command (PostgreSQL 11+). If the column is omitted in theCOPY
column list, values will be generated.- Sequence Synchronization: After a large
COPY
operation that did include explicit IDs (forSERIAL
orGENERATED BY DEFAULT
, orGENERATED ALWAYS
withOVERRIDING SYSTEM VALUE
), the table’s maximum ID might be far beyond the sequence’s current value. You’ll likely need to reset the sequence usingsetval()
(as shown previously) to avoid future duplicate key errors when regularINSERT
s resume.
Partitioning
When using PostgreSQL’s declarative partitioning, the auto-incrementing column (often the primary key) is typically defined on the parent partitioned table. PostgreSQL automatically ensures that the sequence generation works correctly across partitions; rows inserted into the parent table (which get routed to the correct partition) will draw unique values from the single sequence defined on the parent.
Alternatives: UUIDs
While sequential integers are common, UUIDs (Universally Unique Identifiers) offer another approach to generating unique surrogate keys.
- Pros:
- Globally unique (extremely low collision probability across tables, databases, and even systems).
- Can be generated client-side, potentially simplifying certain distributed workflows.
- No central sequence bottleneck (though generating them still takes CPU time).
- Hides information about the number of rows or insertion order.
- Cons:
- Larger (typically 16 bytes vs. 4 or 8 for integers), increasing storage and index size.
- Can lead to index fragmentation and poorer cache locality, especially with completely random UUID versions (like v4), impacting performance. Sequential UUID types (like proposed v7) mitigate this.
- Less human-readable/memorable than integers.
SERIAL
/IDENTITY
are often simpler for basic auto-incrementing needs.
PostgreSQL has excellent support for the UUID
data type and functions like gen_random_uuid()
(requires pgcrypto
extension before PG13, built-in afterwards) or uuid_generate_v4()
(from the uuid-ossp
extension). You can set a default:
“`sql
— Requires uuid-ossp extension or PG13+ for gen_random_uuid()
CREATE EXTENSION IF NOT EXISTS “uuid-ossp”;
CREATE TABLE items (
item_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
item_name TEXT NOT NULL
);
INSERT INTO items (item_name) VALUES (‘Thing 1’);
SELECT * FROM items;
— item_id | item_name
— ————————————–+———–
— a1b2c3d4-e5f6-7890-1234-567890abcdef | Thing 1 (Example UUID)
“`
Choosing between sequential integers and UUIDs depends on specific application requirements regarding uniqueness scope, performance characteristics, key size, and generation strategy.
10. Conclusion
Auto-incrementing primary keys are a cornerstone of relational database design, providing simple, reliable, and efficient row identification. PostgreSQL offers two excellent mechanisms for this: the traditional SERIAL
pseudo-types and the modern, SQL-standard IDENTITY
columns.
SERIAL
(SMALLSERIAL
,SERIAL
,BIGSERIAL
) is the long-standing PostgreSQL method. It’s concise syntactic sugar that creates an integer column, an associated sequence, sets the default tonextval()
, addsNOT NULL
, and links the sequence ownership. It behaves likeGENERATED BY DEFAULT
, allowing overrides duringINSERT
.IDENTITY
(GENERATED ALWAYS AS IDENTITY
,GENERATED BY DEFAULT AS IDENTITY
) is the SQL standard approach, available since PostgreSQL 10. It offers clearer syntax, explicitly defines insert override behavior (ALWAYS
prevents overrides,BY DEFAULT
allows them), allows inline sequence option configuration, and is the recommended choice for new development.
Both mechanisms rely fundamentally on Sequences, independent database objects that generate ordered numbers. Understanding sequences helps in comprehending the behavior, customization, and potential nuances (like gaps) of auto-incrementing columns.
For developers working with PostgreSQL:
- Prefer
IDENTITY
columns (specificallyGENERATED ALWAYS
) for new tables if using PostgreSQL 10 or later. - Understand that
SERIAL
is perfectly functional and acceptable in existing systems or when pre-PG10 compatibility is needed. - Always use the
RETURNING
clause to reliably retrieve the generated ID after anINSERT
. - Accept that gaps in generated IDs are normal and usually not indicative of a problem.
- Know how to inspect sequence properties and reset them carefully if necessary (e.g., after bulk loads).
- Consider UUIDs as an alternative if global uniqueness or client-side generation is required, but be mindful of the performance and storage trade-offs.
By mastering SERIAL
and IDENTITY
, you gain essential tools for building robust, well-structured, and efficient PostgreSQL databases. Choosing the right tool and understanding its behavior ensures data integrity and simplifies application development.