ADD COLUMN with a DEFAULT: sometimes instant, sometimes catastrophic

On PostgreSQL 11 and newer, the same shape of statement can finish in 8ms on a 200GB table or lock the table for 90 minutes. The difference is the volatility class of the default expression, and most production teams still believe the pre-PG11 rule.

Two production migrations, same shape, very different days.

Story A. A 200GB users table. The diff adds one column: ALTER TABLE users ADD COLUMN status text NOT NULL DEFAULT 'pending'. The release engineer holds their breath, watches the deploy, and the migration finishes in 8 milliseconds. They go back to their coffee.

Story B. A 40GB events table. The diff adds one column: ALTER TABLE events ADD COLUMN sampled_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(). The release engineer holds their breath, watches the deploy, and 90 seconds later healthchecks start failing. The migration is holding ACCESS EXCLUSIVE. Pods restart, the queue backs up, the on-call gets paged. 90 minutes later the migration commits. The retro the next day spends a lot of time talking about DEFAULT.

Both engineers had read the same internal runbook. The runbook said “ADD COLUMN with a DEFAULT is a table rewrite, avoid it.” One migration ignored the rule and got away with it. The other followed the spirit of the rule and got destroyed by it.

The rule is from 2017. PostgreSQL 11 made it conditional in 2018. Most teams never updated.

The rule, as it actually is on PG11+

From the PostgreSQL 11 release notes, under “Major enhancements”:

Allow ALTER TABLE to add a column with a non-null default value without a table rewrite.

The mechanism: when you add a column with a default that is a non-volatile expression, Postgres stores the default as a piece of metadata on the table (pg_attribute.atthasmissing and attmissingval). Existing rows are not touched. Every read fills in the default on the fly. The new column appears on disk only when a row is updated for some other reason.

When the default is a volatile expression, or when the column is NOT NULL without a fast default, or when it is GENERATED ... STORED, Postgres has to actually write the value into every existing row. That is a full table rewrite under ACCESS EXCLUSIVE. Reads block. Writes block. The table is unavailable for as long as the rewrite takes.

So the question that decides whether your migration is 8ms or 90 minutes is: is the default expression volatile?

What “volatile” means here

Postgres classifies every function as IMMUTABLE, STABLE, or VOLATILE. The fast-path optimization requires the default to be non-volatile and not reference other table rows. In practice:

Triggers a full table rewrite:

  • clock_timestamp(), statement_timestamp(), timeofday()
  • gen_random_uuid(), uuid_generate_v4()
  • random()
  • nextval('some_seq')
  • Any user-defined function declared VOLATILE (the default if you do not specify)
  • Any expression that reads from another table or uses a subquery
  • GENERATED ALWAYS AS (...) STORED columns

Instant on PG11+ (metadata only):

  • Any literal: 0, 42, 'pending', false, '2020-01-01'::date, '{}'::jsonb
  • Stable timestamp defaults: now(), current_timestamp, transaction_timestamp()
  • IMMUTABLE function calls on literal arguments: md5('seed'), 'a' || 'b'
  • Type casts of literals: 'epoch'::timestamptz, '00000000-0000-0000-0000-000000000000'::uuid

The trap is that volatility names do not read like English. now() is stable and can use the fast path on PG11+. clock_timestamp() is volatile and cannot. They look similar in a migration diff, but they have very different operational behavior.

The classic trap

ALTER TABLE events
  ADD COLUMN sampled_at TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp();

The author wanted “set sampled_at to roughly the time this migration touched each existing row.” The actual effect is: rewrite every row in events, holding ACCESS EXCLUSIVE for the duration, blocking all reads and writes.

Three ways to do this without an incident:

  1. Add nullable, backfill, set not-null. Add the column with no default. Have the application start writing created_at for new rows. Backfill the old rows in batches. Then ADD CONSTRAINT ... CHECK (created_at IS NOT NULL) NOT VALID, VALIDATE CONSTRAINT, SET NOT NULL.
  2. Accept that creation time is approximate. If the existing rows do not have a real creation time anyway, use a constant sentinel: DEFAULT '1970-01-01'::timestamptz. Instant on PG11+. Then backfill the real values out-of-band if you ever learn them.
  3. Use a function default but make the column nullable, then backfill new rows at insert time in the application. Skip the migration-time backfill entirely.

The point is that a volatile default is doing two things at once: setting the value for existing rows, and setting the value for future rows. Splitting those concerns is what makes the migration safe.

The other trap: GENERATED STORED

ALTER TABLE orders
  ADD COLUMN total_cents INT GENERATED ALWAYS AS (price_cents * quantity) STORED;

This looks declarative and modern. Its lock profile is identical to DEFAULT clock_timestamp(): full table rewrite, ACCESS EXCLUSIVE. Postgres has to compute and store the value for every existing row. Most ORMs do not warn. A virtual generated column (without STORED, available on PG12+) avoids the rewrite, but only if you actually do not need the materialized value.

What about plain old DEFAULT 0?

Before PostgreSQL 11, every ADD COLUMN ... DEFAULT <anything> was a full rewrite. That is where the “always avoid DEFAULT” rule came from. The rule made sense in 2017. PostgreSQL 11 shipped in October 2018. The rule outlived its truth almost eight years ago.

On PG11+, ADD COLUMN status text NOT NULL DEFAULT 'pending' is instant. Metadata change. No rewrite. The brief ACCESS EXCLUSIVE to take the metadata lock is sub-millisecond on an idle table and bounded by lock_timeout on a busy one.

(The caveat: you have to actually know what version every Postgres in your fleet runs. RDS often lags. Aurora lags differently. The optimization is real but version-aware.)

Five worked examples

StatementPG11+ behaviorWhy
ADD COLUMN n INT DEFAULT 0InstantLiteral default. Metadata only.
ADD COLUMN s TEXT DEFAULT 'pending'InstantLiteral default. Metadata only.
ADD COLUMN c TIMESTAMPTZ DEFAULT now()Instantnow() is STABLE, so PG11+ can use the fast default path.
ADD COLUMN c TIMESTAMPTZ DEFAULT clock_timestamp()Full rewriteclock_timestamp() is VOLATILE. Each row needs its own value.
ADD COLUMN id UUID DEFAULT gen_random_uuid()Full rewritegen_random_uuid() is VOLATILE. Each row needs a different value.
ADD COLUMN d INT DEFAULT (id * 2)Full rewriteDefault references another column. Has to be computed per row.

If you can predict each of these without looking, you can predict any ADD COLUMN migration.

What pgfence does

pgfence treats these as two different rules.

  • ADD COLUMN ... DEFAULT <constant or stable>: risk LOW on PG11+. The output includes a one-liner explaining that this is the metadata-only path.
  • ADD COLUMN ... DEFAULT <volatile> or GENERATED ... STORED: risk HIGH. The output includes the expand/backfill/contract recipe.
  • ADD COLUMN ... NOT NULL without a default: risk HIGH. Postgres has to scan the table to verify the NOT NULL.

The version check is governed by --min-pg-version (default 14). If you target an older Postgres, pgfence escalates the constant-default case back to HIGH and tells you why.

Try it

npx @flvmnt/pgfence explain "ALTER TABLE users ADD COLUMN status text NOT NULL DEFAULT 'pending'"
# -> LOW (PG11+ metadata-only ADD COLUMN, no table rewrite)

npx @flvmnt/pgfence explain "ALTER TABLE users ADD COLUMN status text NOT NULL DEFAULT clock_timestamp()::text"
# -> HIGH (volatile default forces full table rewrite under ACCESS EXCLUSIVE)

Same shape. Two different stories. The word in the parentheses is the whole game.

npm install -D @flvmnt/pgfence
npx @flvmnt/pgfence analyze migrations/*.sql

GitHub · Docs · npm

← All posts