Database Migration Strategies for Production Systems
Database migrations in production are the most dangerous routine operation in software engineering. A bad migration can corrupt data, cause downtime, or lock tables for minutes. A good migration strategy makes schema changes boring — which is exactly what you want.
This guide covers migration patterns that work for production PostgreSQL databases with real traffic, building on the indexing and query patterns from PostgreSQL indexing strategies.
Problem
Production migrations fail because of:
- Long-running locks that block reads and writes
- Data transformations that take hours on large tables
- Migrations that cannot be rolled back
- Missing indexes on new columns causing slow queries
- Deployments coupled to migration timing
Migration Tooling
Use a dedicated migration tool that tracks applied migrations:
# Supabase CLI
supabase migration new add_status_column
supabase db push
# Alternative: dbmate
dbmate new add_status_column
dbmate up
Each migration gets a timestamped file. The tool tracks which migrations have been applied. Never manually run SQL in production.
Safe Column Addition
Adding a column is safe because PostgreSQL does not rewrite the table:
-- SAFE: Adds column without locking
ALTER TABLE orders ADD COLUMN status TEXT;
-- SAFE: Default value (PostgreSQL 11+, no table rewrite)
ALTER TABLE orders ADD COLUMN priority INTEGER DEFAULT 0;
-- DANGEROUS: Adding NOT NULL without default on existing table
-- This rewrites the entire table in older PostgreSQL versions
ALTER TABLE orders ADD COLUMN status TEXT NOT NULL;
-- SAFE alternative: Add nullable, backfill, then add constraint
ALTER TABLE orders ADD COLUMN status TEXT;
UPDATE orders SET status = 'pending' WHERE status IS NULL;
ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
Safe Index Creation
-- DANGEROUS: Locks the table for the entire index build
CREATE INDEX idx_orders_status ON orders(status);
-- SAFE: Builds index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status);
CONCURRENTLY takes longer but does not lock the table. Always use it in production.
The Expand-Contract Pattern
For breaking schema changes, use two migrations separated by a deployment:
Phase 1: Expand (add new structure)
-- Migration 001: Add new column
ALTER TABLE users ADD COLUMN full_name TEXT;
-- Backfill from existing data
UPDATE users SET full_name = first_name || ' ' || last_name
WHERE full_name IS NULL;
Phase 2: Deploy code that writes to both columns
# Application writes to both old and new columns
async def update_user_name(user_id, first_name, last_name, full_name):
await pool.execute("""
UPDATE users
SET first_name = $2,
last_name = $3,
full_name = $4
WHERE id = $1
""", user_id, first_name, last_name, full_name)
Phase 3: Contract (remove old structure)
-- Migration 002: Remove old columns (after all code uses full_name)
ALTER TABLE users DROP COLUMN first_name;
ALTER TABLE users DROP COLUMN last_name;
This pattern ensures zero downtime. The old code reads old columns. The new code reads new columns. Both work during the transition.
Large Table Backfills
Backfilling millions of rows in one UPDATE locks the table:
-- DANGEROUS: Locks the entire table
UPDATE orders SET status = 'pending' WHERE status IS NULL;
-- SAFE: Batch update
DO $
DECLARE
batch_size INT := 10000;
affected INT;
BEGIN
LOOP
UPDATE orders
SET status = 'pending'
WHERE id IN (
SELECT id FROM orders
WHERE status IS NULL
LIMIT batch_size
FOR UPDATE SKIP LOCKED
);
GET DIAGNOSTICS affected = ROW_COUNT;
EXIT WHEN affected = 0;
PERFORM pg_sleep(0.1); -- Brief pause between batches
RAISE NOTICE 'Updated % rows', affected;
END LOOP;
END $;
Batch updates process rows in small chunks with pauses between them. SKIP LOCKED avoids blocking other operations.
Rollback Strategy
Every migration needs a rollback plan:
-- Forward migration
-- migrations/20240115_add_orders_status.sql
ALTER TABLE orders ADD COLUMN status TEXT DEFAULT 'pending';
CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status);
-- Rollback
-- migrations/20240115_add_orders_status_rollback.sql
DROP INDEX CONCURRENTLY IF EXISTS idx_orders_status;
ALTER TABLE orders DROP COLUMN IF EXISTS status;
Test rollback scripts in staging before deploying to production.
Migration Checklist
Before running any production migration:
□ Tested in staging with production-sized data
□ Estimated lock duration (< 1 second for safe operations)
□ Rollback script written and tested
□ Backfill batched for large tables
□ Index creation uses CONCURRENTLY
□ NOT NULL constraints added after backfill
□ Team notified of migration window
□ Monitoring dashboard open during execution
Common Mistakes
Mistake 1: Running migrations during peak traffic. Schedule migrations during low-traffic periods. Even safe operations benefit from less contention.
Mistake 2: No rollback plan. Every migration should have a tested rollback script. When things go wrong at 2 AM, you need a one-command recovery.
Mistake 3: Coupling deploy and migrate. Deploy first, then migrate. If the migration fails, you can roll back without redeploying. If the deploy fails, the migration has not been applied.
Takeaways
Safe production migrations require zero-downtime patterns: add columns as nullable, use CONCURRENTLY for indexes, batch large backfills, and follow the expand-contract pattern for breaking changes. Always have a rollback plan. Always test with production-sized data. The goal is to make schema changes routine and boring.