The Expand/Contract Pattern: Five Zero-Downtime Migration Recipes
Step-by-step SQL sequences for the five most common dangerous migrations. No downtime, no blocked queries, no surprises.
Every dangerous Postgres migration has a safer rollout pattern. The structure is usually the same: expand first, then contract. Add the new structure alongside the old one, migrate the data, then clean up.
The problem is that nobody writes it out for you. The Postgres documentation tells you that ALTER TABLE ... ADD CONSTRAINT takes a strong lock. It does not hand you the split sequence that moves the expensive validation into a later VALIDATE CONSTRAINT step.
Here are the five recipes I use most. Each one shows the dangerous version first, explains what goes wrong, then gives you the safe sequence.
Recipe 1: ADD COLUMN with NOT NULL + non-constant DEFAULT
The dangerous version
ALTER TABLE users ADD COLUMN created_at timestamptz NOT NULL DEFAULT now();
This is a classic footgun. On Postgres 11+, a constant default is metadata-only. But expressions like now() are not constant, so Postgres rewrites existing rows under ACCESS EXCLUSIVE. If the column also needs NOT NULL, you still need the staged backfill and constraint flow.
The safe version
Split into three steps: expand, backfill, contract.
-- Step 1: Expand (instant, metadata-only)
ALTER TABLE users ADD COLUMN IF NOT EXISTS created_at timestamptz;
-- Step 2: Backfill (out-of-band, no long lock)
-- Run in your application code or a script, not the migration.
-- Repeat until 0 rows updated:
WITH batch AS (
SELECT ctid FROM users WHERE created_at IS NULL
LIMIT 1000 FOR UPDATE SKIP LOCKED
)
UPDATE users t SET created_at = now()
FROM batch WHERE t.ctid = batch.ctid;
-- Step 3: Set the default for new rows
ALTER TABLE users ALTER COLUMN created_at SET DEFAULT now();
-- Step 4: Contract (add NOT NULL safely)
ALTER TABLE users ADD CONSTRAINT chk_created_at_nn
CHECK (created_at IS NOT NULL) NOT VALID;
ALTER TABLE users VALIDATE CONSTRAINT chk_created_at_nn;
ALTER TABLE users ALTER COLUMN created_at SET NOT NULL;
ALTER TABLE users DROP CONSTRAINT chk_created_at_nn;
Why this works: the expensive part, the backfill, happens outside the schema change. NOT VALID skips existing rows when adding the constraint, VALIDATE CONSTRAINT scans them later under SHARE UPDATE EXCLUSIVE, and SET NOT NULL is then an instant metadata step because Postgres trusts the validated check.
Three or four steps instead of one. Each schema step is brief even on a large table, and only the backfill does meaningful row-by-row work.
Recipe 2: ADD FOREIGN KEY
The dangerous version
ALTER TABLE orders ADD CONSTRAINT fk_orders_user
FOREIGN KEY (user_id) REFERENCES users(id);
This acquires SHARE ROW EXCLUSIVE on both orders and users. Postgres scans both tables to verify every foreign key value exists. On large tables, this blocks all writes to both tables for the duration of the scan.
The safe version
-- Step 1: Add the constraint without validation (brief lock, no scan)
ALTER TABLE orders ADD CONSTRAINT fk_orders_user
FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;
-- Step 2: Validate separately (SHARE UPDATE EXCLUSIVE, non-blocking)
ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_user;
Why this works: NOT VALID adds the constraint metadata without scanning existing rows. New inserts are validated immediately. VALIDATE CONSTRAINT then scans existing rows under SHARE UPDATE EXCLUSIVE, which does not block reads or writes.
Recipe 3: ADD UNIQUE CONSTRAINT
The dangerous version
ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email);
This builds a unique index inline while holding SHARE ROW EXCLUSIVE. On a table with millions of rows, index creation can take minutes, during which all writes are blocked.
The safe version
-- Step 1: Build the index concurrently (SHARE UPDATE EXCLUSIVE)
CREATE UNIQUE INDEX CONCURRENTLY uq_users_email_idx ON users(email);
-- Step 2: Attach it as a constraint (brief SHARE UPDATE EXCLUSIVE, no rebuild)
ALTER TABLE users ADD CONSTRAINT uq_users_email
UNIQUE USING INDEX uq_users_email_idx;
Why this works: CREATE UNIQUE INDEX CONCURRENTLY builds the index without blocking reads or writes. It takes SHARE UPDATE EXCLUSIVE, which only conflicts with other DDL. Once the index exists, USING INDEX tells Postgres to use it directly instead of building a new one, so the SHARE UPDATE EXCLUSIVE lock is held only for the brief metadata update.
Important: CREATE INDEX CONCURRENTLY cannot run inside a transaction. If your migration tool wraps everything in a transaction, you need to run this step separately.
Recipe 4: ALTER COLUMN TYPE
The dangerous version
ALTER TABLE events ALTER COLUMN payload TYPE jsonb;
Unless this is a metadata-only change like widening varchar(50) to varchar(100), Postgres often rewrites every row in the table under ACCESS EXCLUSIVE. On a 10M-row table, this can lock the table for minutes.
The safe version
Use the expand/contract pattern with a new column:
-- Step 1: Add new column with target type (instant)
ALTER TABLE events ADD COLUMN payload_new jsonb;
-- Step 2: Backfill out-of-band in batches
-- Repeat until 0 rows updated:
WITH batch AS (
SELECT ctid FROM events WHERE payload_new IS NULL
LIMIT 1000 FOR UPDATE SKIP LOCKED
)
UPDATE events t SET payload_new = payload::jsonb
FROM batch WHERE t.ctid = batch.ctid;
-- Step 3: Swap at the application level
-- Update your application to read/write both columns during the transition.
-- Once all rows are backfilled and the application uses the new column:
-- Step 4: Drop old column (separate migration, after verification)
ALTER TABLE events DROP COLUMN payload;
ALTER TABLE events RENAME COLUMN payload_new TO payload;
Why this works: Each individual step is either instant (metadata-only) or runs without exclusive locks (the batched update). The tradeoff is complexity: your application needs to handle both columns during the transition period. For most teams, a few hours of dual-column code is better than minutes of total lockout.
Note: Some type changes are metadata-only and safe, especially within the text family. Widening a varchar length is often metadata-only, while many other type changes still require a rewrite. pgfence calls out the clearly safe cases and treats the uncertain ones conservatively.
Recipe 5: SET NOT NULL
The dangerous version
ALTER TABLE users ALTER COLUMN name SET NOT NULL;
Postgres validates every existing row under ACCESS EXCLUSIVE. On a large table, this blocks all access for the duration of the scan.
The safe version
-- Step 1: Add a CHECK constraint with NOT VALID (brief lock, no scan)
ALTER TABLE users ADD CONSTRAINT chk_name_nn
CHECK (name IS NOT NULL) NOT VALID;
-- Step 2: Validate the constraint (SHARE UPDATE EXCLUSIVE)
ALTER TABLE users VALIDATE CONSTRAINT chk_name_nn;
-- Step 3: Add the real NOT NULL (instant, trusts the constraint)
ALTER TABLE users ALTER COLUMN name SET NOT NULL;
-- Step 4: Clean up
ALTER TABLE users DROP CONSTRAINT chk_name_nn;
Why this works: This is the same pattern as Recipe 1’s contract phase. Postgres 12+ recognizes that a validated CHECK constraint on col IS NOT NULL is equivalent to a column-level NOT NULL, so SET NOT NULL becomes an instant metadata operation.
The common thread
Every recipe follows the same structure:
- Expand: Add the new structure alongside the old (instant or near-instant)
- Migrate: Move data in batches without holding exclusive locks
- Contract: Remove the old structure (brief lock, metadata-only)
The lock is never held during the expensive part (data scanning or index building). The expensive part always happens under a weaker lock that allows normal operations to continue.
Let a tool do it
You don’t need to memorize these recipes. pgfence analyzes your migration SQL and prints the safer rollout sequence for risky patterns, often with placeholders where your backfill or rollout logic still needs application-specific values:
pgfence analyze migrations/0042_add_email_verified.sql
Statement ALTER TABLE users ADD COLUMN created_at timestamptz NOT NULL DEFAULT now()
Lock Mode ACCESS EXCLUSIVE (blocks reads + writes)
Risk HIGH
Message ADD COLUMN "created_at" with non-constant DEFAULT causes table rewrite under ACCESS EXCLUSIVE lock
Safe Rewrite:
1. ALTER TABLE users ADD COLUMN IF NOT EXISTS created_at timestamptz;
2. -- Backfill out-of-band in batches
3. ALTER TABLE users ALTER COLUMN created_at SET DEFAULT <fill_value>;
4. ALTER TABLE users ADD CONSTRAINT chk_created_at_nn CHECK (created_at IS NOT NULL) NOT VALID;
5. ALTER TABLE users VALIDATE CONSTRAINT chk_created_at_nn;
6. ALTER TABLE users ALTER COLUMN created_at SET NOT NULL;
7. ALTER TABLE users DROP CONSTRAINT chk_created_at_nn;
It works with plain SQL, TypeORM, Knex, Prisma, Drizzle, and Sequelize migrations. Run it in CI and it catches the dangerous patterns before they reach production.
npm install -g @flvmnt/pgfence
pgfence analyze --ci --max-risk medium migrations/*.sql