ADD COLUMN with a DEFAULT: sometimes instant, sometimes catastrophic
On PostgreSQL 11 and newer, the same shape of statement can finish in 8ms on a 200GB table or lock the table for 90 minutes. The difference is the volatility class of the default expression, and most production teams still believe the pre-PG11 rule.
Two production migrations, same shape, very different days.
Story A. A 200GB users table. The diff adds one column: ALTER TABLE users ADD COLUMN status text NOT NULL DEFAULT 'pending'. The release engineer holds their breath, watches the deploy, and the migration finishes in 8 milliseconds. They go back to their coffee.
Story B. A 40GB events table. The diff adds one column: ALTER TABLE events ADD COLUMN sampled_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(). The release engineer holds their breath, watches the deploy, and 90 seconds later healthchecks start failing. The migration is holding ACCESS EXCLUSIVE. Pods restart, the queue backs up, the on-call gets paged. 90 minutes later the migration commits. The retro the next day spends a lot of time talking about DEFAULT.
Both engineers had read the same internal runbook. The runbook said “ADD COLUMN with a DEFAULT is a table rewrite, avoid it.” One migration ignored the rule and got away with it. The other followed the spirit of the rule and got destroyed by it.
The rule is from 2017. PostgreSQL 11 made it conditional in 2018. Most teams never updated.
The rule, as it actually is on PG11+
From the PostgreSQL 11 release notes, under “Major enhancements”:
Allow
ALTER TABLEto add a column with a non-null default value without a table rewrite.
The mechanism: when you add a column with a default that is a non-volatile expression, Postgres stores the default as a piece of metadata on the table (pg_attribute.atthasmissing and attmissingval). Existing rows are not touched. Every read fills in the default on the fly. The new column appears on disk only when a row is updated for some other reason.
When the default is a volatile expression, or when the column is NOT NULL without a fast default, or when it is GENERATED ... STORED, Postgres has to actually write the value into every existing row. That is a full table rewrite under ACCESS EXCLUSIVE. Reads block. Writes block. The table is unavailable for as long as the rewrite takes.
So the question that decides whether your migration is 8ms or 90 minutes is: is the default expression volatile?
What “volatile” means here
Postgres classifies every function as IMMUTABLE, STABLE, or VOLATILE. The fast-path optimization requires the default to be non-volatile and not reference other table rows. In practice:
Triggers a full table rewrite:
clock_timestamp(),statement_timestamp(),timeofday()gen_random_uuid(),uuid_generate_v4()random()nextval('some_seq')- Any user-defined function declared
VOLATILE(the default if you do not specify) - Any expression that reads from another table or uses a subquery
GENERATED ALWAYS AS (...) STOREDcolumns
Instant on PG11+ (metadata only):
- Any literal:
0,42,'pending',false,'2020-01-01'::date,'{}'::jsonb - Stable timestamp defaults:
now(),current_timestamp,transaction_timestamp() IMMUTABLEfunction calls on literal arguments:md5('seed'),'a' || 'b'- Type casts of literals:
'epoch'::timestamptz,'00000000-0000-0000-0000-000000000000'::uuid
The trap is that volatility names do not read like English. now() is stable and can use the fast path on PG11+. clock_timestamp() is volatile and cannot. They look similar in a migration diff, but they have very different operational behavior.
The classic trap
ALTER TABLE events
ADD COLUMN sampled_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp();
The author wanted “set sampled_at to roughly the time this migration touched each existing row.” The actual effect is: rewrite every row in events, holding ACCESS EXCLUSIVE for the duration, blocking all reads and writes.
Three ways to do this without an incident:
- Add nullable, backfill, set not-null. Add the column with no default. Have the application start writing
created_atfor new rows. Backfill the old rows in batches. ThenADD CONSTRAINT ... CHECK (created_at IS NOT NULL) NOT VALID,VALIDATE CONSTRAINT,SET NOT NULL. - Accept that creation time is approximate. If the existing rows do not have a real creation time anyway, use a constant sentinel:
DEFAULT '1970-01-01'::timestamptz. Instant on PG11+. Then backfill the real values out-of-band if you ever learn them. - Use a function default but make the column nullable, then backfill new rows at insert time in the application. Skip the migration-time backfill entirely.
The point is that a volatile default is doing two things at once: setting the value for existing rows, and setting the value for future rows. Splitting those concerns is what makes the migration safe.
The other trap: GENERATED STORED
ALTER TABLE orders
ADD COLUMN total_cents INT GENERATED ALWAYS AS (price_cents * quantity) STORED;
This looks declarative and modern. Its lock profile is identical to DEFAULT clock_timestamp(): full table rewrite, ACCESS EXCLUSIVE. Postgres has to compute and store the value for every existing row. Most ORMs do not warn. A virtual generated column (without STORED, available on PG12+) avoids the rewrite, but only if you actually do not need the materialized value.
What about plain old DEFAULT 0?
Before PostgreSQL 11, every ADD COLUMN ... DEFAULT <anything> was a full rewrite. That is where the “always avoid DEFAULT” rule came from. The rule made sense in 2017. PostgreSQL 11 shipped in October 2018. The rule outlived its truth almost eight years ago.
On PG11+, ADD COLUMN status text NOT NULL DEFAULT 'pending' is instant. Metadata change. No rewrite. The brief ACCESS EXCLUSIVE to take the metadata lock is sub-millisecond on an idle table and bounded by lock_timeout on a busy one.
(The caveat: you have to actually know what version every Postgres in your fleet runs. RDS often lags. Aurora lags differently. The optimization is real but version-aware.)
Five worked examples
| Statement | PG11+ behavior | Why |
|---|---|---|
ADD COLUMN n INT DEFAULT 0 | Instant | Literal default. Metadata only. |
ADD COLUMN s TEXT DEFAULT 'pending' | Instant | Literal default. Metadata only. |
ADD COLUMN c TIMESTAMPTZ DEFAULT now() | Instant | now() is STABLE, so PG11+ can use the fast default path. |
ADD COLUMN c TIMESTAMPTZ DEFAULT clock_timestamp() | Full rewrite | clock_timestamp() is VOLATILE. Each row needs its own value. |
ADD COLUMN id UUID DEFAULT gen_random_uuid() | Full rewrite | gen_random_uuid() is VOLATILE. Each row needs a different value. |
ADD COLUMN d INT DEFAULT (id * 2) | Full rewrite | Default references another column. Has to be computed per row. |
If you can predict each of these without looking, you can predict any ADD COLUMN migration.
What pgfence does
pgfence treats these as two different rules.
ADD COLUMN ... DEFAULT <constant or stable>: riskLOWon PG11+. The output includes a one-liner explaining that this is the metadata-only path.ADD COLUMN ... DEFAULT <volatile>orGENERATED ... STORED: riskHIGH. The output includes the expand/backfill/contract recipe.ADD COLUMN ... NOT NULLwithout a default: riskHIGH. Postgres has to scan the table to verify the NOT NULL.
The version check is governed by --min-pg-version (default 14). If you target an older Postgres, pgfence escalates the constant-default case back to HIGH and tells you why.
Try it
npx @flvmnt/pgfence explain "ALTER TABLE users ADD COLUMN status text NOT NULL DEFAULT 'pending'"
# -> LOW (PG11+ metadata-only ADD COLUMN, no table rewrite)
npx @flvmnt/pgfence explain "ALTER TABLE users ADD COLUMN status text NOT NULL DEFAULT clock_timestamp()::text"
# -> HIGH (volatile default forces full table rewrite under ACCESS EXCLUSIVE)
Same shape. Two different stories. The word in the parentheses is the whole game.
npm install -D @flvmnt/pgfence
npx @flvmnt/pgfence analyze migrations/*.sql