skills/data-migration-specialist/SKILL.md
Large-scale data migrations with validation, rollback, and zero-downtime strategies. Activate on: data migration, database migration, zero-downtime migration, dual-write, backfill, cutover, data validation, schema migration. NOT for: schema evolution in streams (use schema-evolution-manager), API versioning (use api-versioning-backward-compatibility).
npx skillsauth add curiositech/windags-skills data-migration-specialistInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Plan and execute large-scale data migrations with zero-downtime strategies, comprehensive validation, and reliable rollback plans.
Activate on: "data migration", "database migration", "zero-downtime migration", "dual-write", "backfill", "cutover", "data validation", "schema migration", "platform migration", "cloud migration"
NOT for: Streaming schema evolution → schema-evolution-manager | API backward compatibility → api-versioning-backward-compatibility | Warehouse optimization → data-warehouse-optimizer
| Domain | Technologies | |--------|-------------| | Strategies | Dual-write, CDC-based, big bang, strangler fig | | CDC Tools | Debezium, AWS DMS, GCP Datastream, pglogical | | Validation | Great Expectations, custom checksums, row-count reconciliation | | Schema | Flyway, Liquibase, Prisma Migrate, Alembic | | Orchestration | Airflow, Temporal, custom migration scripts |
Phase 1: Dual-Write (days/weeks)
──────────────────────────────────
App writes → Old DB (primary)
→ New DB (shadow, async)
Reads from: Old DB only
Phase 2: Shadow Read Validation
──────────────────────────────────
App writes → Old DB + New DB
Reads from: Old DB (primary)
New DB (shadow, compare results)
Phase 3: Cutover
──────────────────────────────────
App writes → New DB (primary)
→ Old DB (shadow, for rollback)
Reads from: New DB
Phase 4: Cleanup
──────────────────────────────────
App writes → New DB only
Remove old DB writes
Decommission old DB (after rollback window expires)
class MigrationValidator:
"""Run after each migration phase to verify data integrity."""
def validate_row_counts(self):
"""Source and target row counts must match within tolerance."""
for table in self.tables:
source = self.source_db.count(table)
target = self.target_db.count(table)
tolerance = 0.001 # 0.1% tolerance for in-flight writes
assert abs(source - target) / source < tolerance, \
f"{table}: source={source}, target={target}"
def validate_checksums(self):
"""Hash comparison on sampled rows."""
for table in self.tables:
sample_ids = self.source_db.sample_ids(table, n=10000)
for batch in chunked(sample_ids, 1000):
source_hash = self.source_db.hash_rows(table, batch)
target_hash = self.target_db.hash_rows(table, batch)
assert source_hash == target_hash, \
f"{table}: checksum mismatch in batch"
def validate_business_rules(self):
"""Domain-specific invariants."""
# Example: total revenue must match
source_rev = self.source_db.query("SELECT SUM(amount) FROM orders")
target_rev = self.target_db.query("SELECT SUM(amount) FROM orders")
assert source_rev == target_rev, "Revenue mismatch!"
def validate_constraints(self):
"""All FKs, unique constraints, and NOT NULLs hold on target."""
violations = self.target_db.check_constraints()
assert len(violations) == 0, f"Constraint violations: {violations}"
Migration issue detected?
│
├─ Data loss or corruption? → IMMEDIATE ROLLBACK
│ Switch reads/writes back to old DB
│ Replay writes from new DB → old DB (if needed)
│
├─ Performance regression? → EVALUATE
│ ├─ < 2x slower → optimize, do not rollback
│ └─ > 2x slower → rollback, investigate
│
└─ Minor data discrepancy? → FIX FORWARD
Run reconciliation job to sync
Do NOT rollback for fixable issues
Rollback window: keep old DB live for 7-14 days post-cutover
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.