Database Migration Plan Skill

Produce a complete, safe database migration plan for a schema change. A migration plan is not just the SQL — it is a coordinated sequence of steps that ensures the application stays available, data stays consistent, and every step can be rolled back independently.

The expand/contract pattern is the default approach: expand the schema to support both old and new states, migrate the application, then contract to remove the old state. Never combine schema changes and data backfills in a single migration that runs during deployment.

Required Inputs

Ask for these if not already provided:

Current schema state — the DDL or description of the table(s) as they are now
Target schema state — the DDL or description of what the table(s) should look like after migration
Migration reason — why this change is being made (new feature, performance fix, normalization, compliance)
Database engine — PostgreSQL, MySQL, SQLite, CockroachDB, etc.
Estimated data volume — approximate number of rows in affected tables
Deployment constraints — is any downtime allowed? What is the expected traffic level during migration? Are there multiple app instances running?
Rollback window — how long after deploy can the team roll back before the migration becomes irreversible?

Output Format

Database Migration Plan: [Migration Name]

Service: [Name] | Team: [Team name] Author: [Name] | Reviewed by: [Name / DBA] Date: [Date] | Target deploy date: [Date] Database engine: [PostgreSQL X.X / MySQL X.X] Ticket: [JIRA-XXX]

1. Migration Overview

What is changing: [1–2 sentences: the specific schema change — e.g. "Adding a non-nullable organisation_id column to the users table and backfilling it from the accounts table."]

Why: [1–2 sentences: the business or technical reason driving the change.]

Migration type: [Additive only / Additive + backfill / Column rename / Column type change / Table restructure / Index change]

Zero-downtime: [Yes — using expand/contract / No — requires maintenance window — state duration]

Estimated migration duration:

Expand phase: [~X minutes]
Data backfill: [~X minutes/hours — based on X rows at Y rows/second]
Contract phase: [~X minutes after app version deployed]

2. Backward Compatibility Analysis

Before writing a single line of SQL, assess whether each change is backward compatible with the currently deployed application code.

| Change | Backward compatible? | Risk | Notes | |---|---|---|---| | [e.g. Add nullable column org_id] | Yes | Low | Old app ignores new column | | [e.g. Backfill org_id] | Yes | Medium | Old app unaffected; new app reads backfilled values | | [e.g. Add NOT NULL constraint to org_id] | No | High | Old app that inserts without org_id will fail | | [e.g. Drop old column account_id] | No | High | Old app that reads account_id will fail | | [e.g. Add index on org_id] | Yes | Low | Additive; no breaking change | | [e.g. Rename column] | No | High | Never rename in one step; use expand/contract |

Summary: [e.g. "This migration requires the expand/contract pattern across 3 deployment phases because steps 3 and 4 are not backward compatible."]

3. Expand/Contract Phases

Phase Overview

Phase 1 — EXPAND
  Deploy migration: add new column (nullable), create new indexes
  Old app: continues to work (ignores new column)
  New app: not yet deployed
  Duration: [~X min] | Rollback: trivial — drop new column

       │
       ▼

Phase 2 — BACKFILL + DUAL-WRITE
  Deploy app update: writes to both old and new columns
  Run backfill: populate new column for existing rows
  Validate: confirm 100% of rows have non-null new column
  Duration: [~X hours depending on data volume]
  Rollback: deploy previous app version; new column is still nullable

       │
       ▼

Phase 3 — ENFORCE + SWITCH
  Deploy migration: add NOT NULL constraint, drop old column/index
  Deploy app update: reads only from new column
  Duration: [~X min] | Rollback: requires forward-fix (constraint must be dropped first)

       │
       ▼

Phase 4 — CONTRACT (optional cleanup)
  Deploy migration: drop deprecated columns, rename if needed
  Final state matches target schema
  Rollback: not recommended — contract changes are destructive

Phase 1 — Expand Schema

Goal: Add the new column and structures without breaking the existing application. Deploy order: Run migration first, then (optionally) deploy app. Application state: Old app running; no app changes required yet.

-- Migration: 001_add_org_id_to_users.sql
BEGIN;

-- Add nullable column (safe — old app ignores it)
ALTER TABLE users
    ADD COLUMN org_id UUID NULL
        REFERENCES organisations(id) ON DELETE RESTRICT;

-- Add index NOW, not in Phase 3 — building index on large table during Phase 3 is risky
CREATE INDEX CONCURRENTLY users_org_id_idx ON users (org_id);

-- Note: CONCURRENTLY does not lock the table; safe on live traffic
-- Note: Cannot run CONCURRENTLY inside a transaction block; run separately if needed

COMMIT;

Validation after Phase 1:

-- Confirm column exists and is nullable
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: is_nullable = 'YES'

-- Confirm index exists
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users' AND indexname = 'users_org_id_idx';

Rollback (Phase 1 only):

BEGIN;
DROP INDEX CONCURRENTLY IF EXISTS users_org_id_idx;
ALTER TABLE users DROP COLUMN IF EXISTS org_id;
COMMIT;

Phase 2 — Backfill Existing Data

Goal: Populate the new column for all existing rows before enforcing NOT NULL. When to run: After Phase 1 is live and stable. Can be run as a background job or a one-time script. Application state: Deploy app version that dual-writes to both old and new columns.

App code change required:

// All INSERT and UPDATE operations must now set BOTH old_column and new_column
// until Phase 3 is complete. This ensures new rows are populated during the backfill window.

Backfill script — batch processing:

-- Run in batches to avoid locking. Adjust batch size based on table size and DB load.
-- Target: no single batch takes more than 5 seconds.

DO $$
DECLARE
    batch_size  INT := 1000;
    affected    INT;
BEGIN
    LOOP
        UPDATE users
        SET    org_id = accounts.organisation_id
        FROM   accounts
        WHERE  users.account_id = accounts.id
          AND  users.org_id IS NULL
        LIMIT  batch_size;

        GET DIAGNOSTICS affected = ROW_COUNT;
        EXIT WHEN affected = 0;

        -- Pause between batches to avoid saturating I/O
        PERFORM pg_sleep(0.1);
    END LOOP;
END $$;

Monitoring during backfill:

-- Check progress — run periodically during backfill
SELECT
    COUNT(*) FILTER (WHERE org_id IS NOT NULL) AS backfilled,
    COUNT(*) FILTER (WHERE org_id IS NULL)     AS remaining,
    COUNT(*)                                   AS total,
    ROUND(
        100.0 * COUNT(*) FILTER (WHERE org_id IS NOT NULL) / COUNT(*), 2
    ) AS pct_complete
FROM users;

Backfill completion validation:

-- Must return 0 before proceeding to Phase 3
SELECT COUNT(*) AS unbackfilled_rows
FROM users
WHERE org_id IS NULL;

-- Confirm no new rows written without org_id (dual-write working)
SELECT COUNT(*) AS recent_missing
FROM users
WHERE org_id IS NULL
  AND created_at > now() - INTERVAL '1 hour';

Rollback (Phase 2 — app only):

Deploy previous app version (single-write to old column)
org_id column remains nullable; no data is lost
Backfilled values remain; harmless

Phase 3 — Enforce Constraints

Goal: Add NOT NULL constraint and remove dependency on the old column. Prerequisites: Phase 2 backfill must be 100% complete (zero rows with org_id IS NULL). Deploy order: Run migration, then deploy app version that reads only from org_id.

PostgreSQL — use NOT VALID + VALIDATE for large tables:

-- Step 1: Add constraint as NOT VALID (no full table scan — instant)
ALTER TABLE users
    ADD CONSTRAINT users_org_id_not_null
    CHECK (org_id IS NOT NULL) NOT VALID;

-- Step 2: VALIDATE CONSTRAINT (takes a SHARE UPDATE EXCLUSIVE lock — allows reads and writes)
-- Run this separately, as it can take minutes on large tables
ALTER TABLE users
    VALIDATE CONSTRAINT users_org_id_not_null;

-- Step 3: Once validated, convert to actual NOT NULL
-- (PostgreSQL trusts the validated check constraint — this is instant)
ALTER TABLE users
    ALTER COLUMN org_id SET NOT NULL;

-- Step 4: Drop the now-redundant check constraint
ALTER TABLE users
    DROP CONSTRAINT users_org_id_not_null;

Validation after Phase 3:

-- Confirm NOT NULL is enforced
SELECT column_name, is_nullable
FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: is_nullable = 'NO'

-- Test that insert without org_id fails (run in a transaction and roll back)
BEGIN;
INSERT INTO users (email) VALUES ('[email protected]');
-- Expected: ERROR: null value in column "org_id" violates not-null constraint
ROLLBACK;

Rollback (Phase 3):

-- Drop the NOT NULL constraint (restores nullable state)
ALTER TABLE users ALTER COLUMN org_id DROP NOT NULL;
-- Then deploy previous app version (dual-write)
-- Note: Once app code reading the new column is live, rolling back the constraint
-- without rolling back the app will cause issues — plan this carefully.

Phase 4 — Contract (Remove Old Column)

Goal: Remove the old column once the app no longer references it. Prerequisites: Phase 3 fully deployed and stable for at least [X days/hours rollback window]. Warning: This phase is destructive — the old column's data is permanently deleted.

BEGIN;

-- Drop the old column
ALTER TABLE users DROP COLUMN account_id;

-- Drop any indexes that referenced the old column
DROP INDEX IF EXISTS users_account_id_idx;

COMMIT;

Pre-drop validation:

-- Confirm no application queries still reference the old column
-- (Check this in code review and via a search of the codebase before running)
-- grep -r "account_id" app/

-- Confirm the column is safe to drop
SELECT COUNT(*) FROM users WHERE account_id IS NOT NULL;
-- Should be 0 (or irrelevant once new column is canonical)

Rollback: Not straightforward — dropped column data cannot be recovered. Only proceed to Phase 4 after the rollback window has passed and the change is confirmed stable.

4. Data Validation Plan

Run these queries before and after the full migration to confirm data integrity.

Pre-migration baseline:

-- Record these values before any migration step
SELECT COUNT(*)   AS total_users FROM users;
SELECT COUNT(*)   AS total_orgs  FROM organisations;
SELECT MIN(created_at), MAX(created_at) FROM users;

-- Check for any anomalies in the source data before backfill
SELECT COUNT(*) AS users_without_account
FROM users WHERE account_id IS NULL;

Post-backfill integrity check:

-- All users have an org that exists
SELECT COUNT(*) AS orphaned_org_refs
FROM users u
WHERE u.org_id IS NOT NULL
  AND NOT EXISTS (
      SELECT 1 FROM organisations o WHERE o.id = u.org_id
  );
-- Expected: 0

-- org_id matches expected value from source column
SELECT COUNT(*) AS mismatched_backfill
FROM users u
JOIN accounts a ON u.account_id = a.id
WHERE u.org_id != a.organisation_id;
-- Expected: 0

-- Row count unchanged (no rows created or deleted by migration)
SELECT COUNT(*) AS total_users_after FROM users;
-- Must match pre-migration baseline

Post-contract final check:

-- Old column is gone
SELECT COUNT(*) FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'account_id';
-- Expected: 0

-- New column is NOT NULL
SELECT is_nullable FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: NO

5. Performance Impact Assessment

| Step | Lock type | Lock duration | Traffic impact | |---|---|---|---| | Add nullable column | ACCESS EXCLUSIVE | Milliseconds | Negligible | | CREATE INDEX CONCURRENTLY | SHARE UPDATE EXCLUSIVE | Minutes (proportional to table size) | Reads and writes continue | | Batch backfill | Row-level locks only | <5s per batch | Low if batches are small | | ADD CONSTRAINT NOT VALID | ACCESS EXCLUSIVE | Milliseconds | Negligible | | VALIDATE CONSTRAINT | SHARE UPDATE EXCLUSIVE | Minutes | Reads and writes continue | | ALTER COLUMN SET NOT NULL | ACCESS EXCLUSIVE | Milliseconds (if check constraint validated) | Negligible | | DROP COLUMN | ACCESS EXCLUSIVE | Milliseconds | Negligible |

Expected load increase during backfill:

DB CPU: [estimated % increase during batch writes]
DB I/O: [estimated increase]
Monitoring threshold to pause backfill: [e.g. DB CPU > 80% for >2 minutes]

Backfill rate estimate:

Table size: [X million rows]
Batch size: [1000 rows]
Pause between batches: [100ms]
Estimated total duration: [X hours at Y rows/second]

6. Deployment Runbook

Follow this checklist on the day of migration. Mark each step as done before proceeding.

Pre-migration (day before):

[ ] DBA / tech lead has reviewed the migration plan
[ ] Performance impact assessed; monitoring dashboards ready
[ ] Backfill script tested on a staging DB with production-scale data
[ ] Rollback procedure tested on staging
[ ] On-call engineer briefed; Slack channel [#db-migrations] set up for coordination
[ ] Maintenance window scheduled (if required)

Phase 1 — Expand (T+0):

[ ] Take a manual DB snapshot / verify automated backup is recent
[ ] Run 001_expand_add_org_id.sql on production
[ ] Run Phase 1 validation queries — confirm pass
[ ] Deploy app version with dual-write
[ ] Monitor error rate for [10 minutes]

Phase 2 — Backfill (T+[X hours]):

[ ] Confirm Phase 1 has been stable for [X hours]
[ ] Start backfill script in a screen/tmux session
[ ] Monitor progress via backfill progress query every [5 minutes]
[ ] Monitor DB CPU and I/O — pause if thresholds exceeded
[ ] Run completion validation — confirm 0 unbackfilled rows
[ ] Run integrity checks — confirm 0 orphaned refs, 0 mismatches

Phase 3 — Enforce (T+[X days]):

[ ] Confirm backfill 100% complete and stable for [X hours]
[ ] Add NOT VALID constraint
[ ] Run VALIDATE CONSTRAINT (monitor duration and lock waits)
[ ] Alter column to NOT NULL
[ ] Run Phase 3 validation queries
[ ] Deploy app version reading only from new column
[ ] Monitor error rate for [30 minutes]

Phase 4 — Contract (T+[X days after rollback window]):

[ ] Confirm rollback window has passed — no incidents, no rollback needed
[ ] Search codebase for references to old column — confirm zero
[ ] Run DROP COLUMN migration
[ ] Run final integrity checks
[ ] Close migration ticket; update schema documentation

Quality Checks

[ ] Every migration phase has an independent rollback procedure — no phase assumes the next one has run
[ ] Batch backfill script includes a pause between batches to avoid saturating I/O
[ ] NOT NULL constraints use the NOT VALID + VALIDATE pattern on tables with >100k rows
[ ] The app dual-write period is explicitly defined — old column writes are not dropped until Phase 3 is deployed
[ ] Data validation queries include a row count check to confirm no data loss
[ ] Lock types are identified for every DDL statement — no "should be fine" assumptions
[ ] The deployment runbook names who runs each step, not just what to run
[ ] Phase 4 (contract) is explicitly gated on the rollback window passing — not run on the same day as Phase 3

Anti-Patterns

[ ] Do not combine the expand and contract phases into a single deployment — they must be separated by a deployment cycle
[ ] Do not run DDL changes without first testing on a production-sized data clone
[ ] Do not skip the NOT VALID + VALIDATE pattern for constraint additions on large tables — it causes full table locks
[ ] Do not define a rollback as "restore from backup" — each phase must have an explicit, fast rollback procedure
[ ] Do not omit dual-write logic during the transition period — removing the old column before all writers are updated causes data loss

Database Migration Plan Skill

Required Inputs

Ask for these if not already provided:

Current schema state — the DDL or description of the table(s) as they are now
Target schema state — the DDL or description of what the table(s) should look like after migration
Migration reason — why this change is being made (new feature, performance fix, normalization, compliance)
Database engine — PostgreSQL, MySQL, SQLite, CockroachDB, etc.
Estimated data volume — approximate number of rows in affected tables
Deployment constraints — is any downtime allowed? What is the expected traffic level during migration? Are there multiple app instances running?
Rollback window — how long after deploy can the team roll back before the migration becomes irreversible?

Output Format

Database Migration Plan: [Migration Name]

1. Migration Overview

What is changing: [1–2 sentences: the specific schema change — e.g. "Adding a non-nullable organisation_id column to the users table and backfilling it from the accounts table."]

Why: [1–2 sentences: the business or technical reason driving the change.]

Migration type: [Additive only / Additive + backfill / Column rename / Column type change / Table restructure / Index change]

Zero-downtime: [Yes — using expand/contract / No — requires maintenance window — state duration]

Estimated migration duration:

Expand phase: [~X minutes]
Data backfill: [~X minutes/hours — based on X rows at Y rows/second]
Contract phase: [~X minutes after app version deployed]

2. Backward Compatibility Analysis

Before writing a single line of SQL, assess whether each change is backward compatible with the currently deployed application code.

Summary: [e.g. "This migration requires the expand/contract pattern across 3 deployment phases because steps 3 and 4 are not backward compatible."]

3. Expand/Contract Phases

Phase Overview

Phase 1 — EXPAND
  Deploy migration: add new column (nullable), create new indexes
  Old app: continues to work (ignores new column)
  New app: not yet deployed
  Duration: [~X min] | Rollback: trivial — drop new column

       │
       ▼

Phase 2 — BACKFILL + DUAL-WRITE
  Deploy app update: writes to both old and new columns
  Run backfill: populate new column for existing rows
  Validate: confirm 100% of rows have non-null new column
  Duration: [~X hours depending on data volume]
  Rollback: deploy previous app version; new column is still nullable

       │
       ▼

Phase 3 — ENFORCE + SWITCH
  Deploy migration: add NOT NULL constraint, drop old column/index
  Deploy app update: reads only from new column
  Duration: [~X min] | Rollback: requires forward-fix (constraint must be dropped first)

       │
       ▼

Phase 4 — CONTRACT (optional cleanup)
  Deploy migration: drop deprecated columns, rename if needed
  Final state matches target schema
  Rollback: not recommended — contract changes are destructive

Phase 1 — Expand Schema

-- Migration: 001_add_org_id_to_users.sql
BEGIN;

-- Add nullable column (safe — old app ignores it)
ALTER TABLE users
    ADD COLUMN org_id UUID NULL
        REFERENCES organisations(id) ON DELETE RESTRICT;

-- Add index NOW, not in Phase 3 — building index on large table during Phase 3 is risky
CREATE INDEX CONCURRENTLY users_org_id_idx ON users (org_id);

-- Note: CONCURRENTLY does not lock the table; safe on live traffic
-- Note: Cannot run CONCURRENTLY inside a transaction block; run separately if needed

COMMIT;

Validation after Phase 1:

-- Confirm column exists and is nullable
SELECT column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: is_nullable = 'YES'

-- Confirm index exists
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users' AND indexname = 'users_org_id_idx';

Rollback (Phase 1 only):

BEGIN;
DROP INDEX CONCURRENTLY IF EXISTS users_org_id_idx;
ALTER TABLE users DROP COLUMN IF EXISTS org_id;
COMMIT;

Phase 2 — Backfill Existing Data

App code change required:

// All INSERT and UPDATE operations must now set BOTH old_column and new_column
// until Phase 3 is complete. This ensures new rows are populated during the backfill window.

Backfill script — batch processing:

-- Run in batches to avoid locking. Adjust batch size based on table size and DB load.
-- Target: no single batch takes more than 5 seconds.

DO $$
DECLARE
    batch_size  INT := 1000;
    affected    INT;
BEGIN
    LOOP
        UPDATE users
        SET    org_id = accounts.organisation_id
        FROM   accounts
        WHERE  users.account_id = accounts.id
          AND  users.org_id IS NULL
        LIMIT  batch_size;

        GET DIAGNOSTICS affected = ROW_COUNT;
        EXIT WHEN affected = 0;

        -- Pause between batches to avoid saturating I/O
        PERFORM pg_sleep(0.1);
    END LOOP;
END $$;

Monitoring during backfill:

-- Check progress — run periodically during backfill
SELECT
    COUNT(*) FILTER (WHERE org_id IS NOT NULL) AS backfilled,
    COUNT(*) FILTER (WHERE org_id IS NULL)     AS remaining,
    COUNT(*)                                   AS total,
    ROUND(
        100.0 * COUNT(*) FILTER (WHERE org_id IS NOT NULL) / COUNT(*), 2
    ) AS pct_complete
FROM users;

Backfill completion validation:

-- Must return 0 before proceeding to Phase 3
SELECT COUNT(*) AS unbackfilled_rows
FROM users
WHERE org_id IS NULL;

-- Confirm no new rows written without org_id (dual-write working)
SELECT COUNT(*) AS recent_missing
FROM users
WHERE org_id IS NULL
  AND created_at > now() - INTERVAL '1 hour';

Rollback (Phase 2 — app only):

Deploy previous app version (single-write to old column)
org_id column remains nullable; no data is lost
Backfilled values remain; harmless

Phase 3 — Enforce Constraints

PostgreSQL — use NOT VALID + VALIDATE for large tables:

-- Step 1: Add constraint as NOT VALID (no full table scan — instant)
ALTER TABLE users
    ADD CONSTRAINT users_org_id_not_null
    CHECK (org_id IS NOT NULL) NOT VALID;

-- Step 2: VALIDATE CONSTRAINT (takes a SHARE UPDATE EXCLUSIVE lock — allows reads and writes)
-- Run this separately, as it can take minutes on large tables
ALTER TABLE users
    VALIDATE CONSTRAINT users_org_id_not_null;

-- Step 3: Once validated, convert to actual NOT NULL
-- (PostgreSQL trusts the validated check constraint — this is instant)
ALTER TABLE users
    ALTER COLUMN org_id SET NOT NULL;

-- Step 4: Drop the now-redundant check constraint
ALTER TABLE users
    DROP CONSTRAINT users_org_id_not_null;

Validation after Phase 3:

-- Confirm NOT NULL is enforced
SELECT column_name, is_nullable
FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: is_nullable = 'NO'

-- Test that insert without org_id fails (run in a transaction and roll back)
BEGIN;
INSERT INTO users (email) VALUES ('[email protected]');
-- Expected: ERROR: null value in column "org_id" violates not-null constraint
ROLLBACK;

Rollback (Phase 3):

-- Drop the NOT NULL constraint (restores nullable state)
ALTER TABLE users ALTER COLUMN org_id DROP NOT NULL;
-- Then deploy previous app version (dual-write)
-- Note: Once app code reading the new column is live, rolling back the constraint
-- without rolling back the app will cause issues — plan this carefully.

Phase 4 — Contract (Remove Old Column)

BEGIN;

-- Drop the old column
ALTER TABLE users DROP COLUMN account_id;

-- Drop any indexes that referenced the old column
DROP INDEX IF EXISTS users_account_id_idx;

COMMIT;

Pre-drop validation:

-- Confirm no application queries still reference the old column
-- (Check this in code review and via a search of the codebase before running)
-- grep -r "account_id" app/

-- Confirm the column is safe to drop
SELECT COUNT(*) FROM users WHERE account_id IS NOT NULL;
-- Should be 0 (or irrelevant once new column is canonical)

Rollback: Not straightforward — dropped column data cannot be recovered. Only proceed to Phase 4 after the rollback window has passed and the change is confirmed stable.

4. Data Validation Plan

Run these queries before and after the full migration to confirm data integrity.

Pre-migration baseline:

-- Record these values before any migration step
SELECT COUNT(*)   AS total_users FROM users;
SELECT COUNT(*)   AS total_orgs  FROM organisations;
SELECT MIN(created_at), MAX(created_at) FROM users;

-- Check for any anomalies in the source data before backfill
SELECT COUNT(*) AS users_without_account
FROM users WHERE account_id IS NULL;

Post-backfill integrity check:

-- All users have an org that exists
SELECT COUNT(*) AS orphaned_org_refs
FROM users u
WHERE u.org_id IS NOT NULL
  AND NOT EXISTS (
      SELECT 1 FROM organisations o WHERE o.id = u.org_id
  );
-- Expected: 0

-- org_id matches expected value from source column
SELECT COUNT(*) AS mismatched_backfill
FROM users u
JOIN accounts a ON u.account_id = a.id
WHERE u.org_id != a.organisation_id;
-- Expected: 0

-- Row count unchanged (no rows created or deleted by migration)
SELECT COUNT(*) AS total_users_after FROM users;
-- Must match pre-migration baseline

Post-contract final check:

-- Old column is gone
SELECT COUNT(*) FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'account_id';
-- Expected: 0

-- New column is NOT NULL
SELECT is_nullable FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'org_id';
-- Expected: NO

5. Performance Impact Assessment

Expected load increase during backfill:

DB CPU: [estimated % increase during batch writes]
DB I/O: [estimated increase]
Monitoring threshold to pause backfill: [e.g. DB CPU > 80% for >2 minutes]

Backfill rate estimate:

Table size: [X million rows]
Batch size: [1000 rows]
Pause between batches: [100ms]
Estimated total duration: [X hours at Y rows/second]

6. Deployment Runbook

Follow this checklist on the day of migration. Mark each step as done before proceeding.

Pre-migration (day before):

[ ] DBA / tech lead has reviewed the migration plan
[ ] Performance impact assessed; monitoring dashboards ready
[ ] Backfill script tested on a staging DB with production-scale data
[ ] Rollback procedure tested on staging
[ ] On-call engineer briefed; Slack channel [#db-migrations] set up for coordination
[ ] Maintenance window scheduled (if required)

Phase 1 — Expand (T+0):

[ ] Take a manual DB snapshot / verify automated backup is recent
[ ] Run 001_expand_add_org_id.sql on production
[ ] Run Phase 1 validation queries — confirm pass
[ ] Deploy app version with dual-write
[ ] Monitor error rate for [10 minutes]

Phase 2 — Backfill (T+[X hours]):

[ ] Confirm Phase 1 has been stable for [X hours]
[ ] Start backfill script in a screen/tmux session
[ ] Monitor progress via backfill progress query every [5 minutes]
[ ] Monitor DB CPU and I/O — pause if thresholds exceeded
[ ] Run completion validation — confirm 0 unbackfilled rows
[ ] Run integrity checks — confirm 0 orphaned refs, 0 mismatches

Phase 3 — Enforce (T+[X days]):

[ ] Confirm backfill 100% complete and stable for [X hours]
[ ] Add NOT VALID constraint
[ ] Run VALIDATE CONSTRAINT (monitor duration and lock waits)
[ ] Alter column to NOT NULL
[ ] Run Phase 3 validation queries
[ ] Deploy app version reading only from new column
[ ] Monitor error rate for [30 minutes]

Phase 4 — Contract (T+[X days after rollback window]):

[ ] Confirm rollback window has passed — no incidents, no rollback needed
[ ] Search codebase for references to old column — confirm zero
[ ] Run DROP COLUMN migration
[ ] Run final integrity checks
[ ] Close migration ticket; update schema documentation

Quality Checks

[ ] Every migration phase has an independent rollback procedure — no phase assumes the next one has run
[ ] Batch backfill script includes a pause between batches to avoid saturating I/O
[ ] NOT NULL constraints use the NOT VALID + VALIDATE pattern on tables with >100k rows
[ ] The app dual-write period is explicitly defined — old column writes are not dropped until Phase 3 is deployed
[ ] Data validation queries include a row count check to confirm no data loss
[ ] Lock types are identified for every DDL statement — no "should be fine" assumptions
[ ] The deployment runbook names who runs each step, not just what to run
[ ] Phase 4 (contract) is explicitly gated on the rollback window passing — not run on the same day as Phase 3

Anti-Patterns

[ ] Do not combine the expand and contract phases into a single deployment — they must be separated by a deployment cycle
[ ] Do not run DDL changes without first testing on a production-sized data clone
[ ] Do not skip the NOT VALID + VALIDATE pattern for constraint additions on large tables — it causes full table locks
[ ] Do not define a rollback as "restore from backup" — each phase must have an explicit, fast rollback procedure
[ ] Do not omit dual-write logic during the transition period — removing the old column before all writers are updated causes data loss

Adoption

mohitagw15856/database-migration-plan

$ install --global

Security Scan Results

SKILL.md

Database Migration Plan Skill

Required Inputs

Output Format

Database Migration Plan: [Migration Name]

1. Migration Overview

2. Backward Compatibility Analysis

3. Expand/Contract Phases

Phase Overview

Phase 1 — Expand Schema

Phase 2 — Backfill Existing Data

Phase 3 — Enforce Constraints

Phase 4 — Contract (Remove Old Column)

4. Data Validation Plan

5. Performance Impact Assessment

6. Deployment Runbook

Quality Checks

Anti-Patterns

Related Skills

mohitagw15856/win-loss-analysis

mohitagw15856/which-skill

mohitagw15856/vuln-triage

mohitagw15856/voice-of-customer-program

mohitagw15856/database-migration-plan

$ install --global

Security Scan Results

SKILL.md

Database Migration Plan Skill

Required Inputs

Output Format

Database Migration Plan: [Migration Name]

1. Migration Overview

2. Backward Compatibility Analysis

3. Expand/Contract Phases

Phase Overview

Phase 1 — Expand Schema

Phase 2 — Backfill Existing Data

Phase 3 — Enforce Constraints

Phase 4 — Contract (Remove Old Column)

4. Data Validation Plan

5. Performance Impact Assessment

6. Deployment Runbook

Quality Checks

Anti-Patterns

Related Skills

mohitagw15856/win-loss-analysis

mohitagw15856/which-skill

mohitagw15856/vuln-triage

mohitagw15856/voice-of-customer-program