migration-patterns-plugin/skills/shadow-mode/SKILL.md
Shadow mode / dark-launch for validating new systems under production load. Use when testing replacement services, comparing behavior, or planning traffic mirroring for migrations.
npx skillsauth add laurigates/claude-plugins shadow-modeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Shadow mode mirrors production traffic to a new system without affecting users. The shadow system's responses are discarded — only the production response reaches the user — but both responses are logged and compared to validate correctness.
| Use this skill when... | Use dual-write instead when... | |------------------------|-------------------------------| | Validating read behavior of a replacement service | Both systems need to persist writes | | Testing performance under real production load | You need the new store to be authoritative | | Comparing response correctness before cutover | Migrating data stores that must stay in sync | | Evaluating a new service version safely | The new system needs to receive and store mutations | | Load testing a new deployment with real traffic | You need strong consistency between systems |
Client Request
│
▼
┌─────────────┐
│ Router / │
│ Proxy │
├──────┬──────┤
│ │ │
▼ │ ▼
Prod │ Shadow
System │ System
│ │ │
▼ │ ▼
Prod │ Shadow
Response│ Response
│ │ │
▼ │ (discard)
Client │ │
│ ▼
│ Compare &
│ Log
▼
| Mode | Description | Use case | |------|------------|----------| | Full mirror | 100% of traffic duplicated | Final validation before cutover | | Sampled mirror | Percentage of traffic (e.g., 10%) | Early validation, capacity-constrained shadow | | Selective mirror | Specific request types or endpoints | Targeted validation of changed behavior | | Replay mirror | Recorded traffic replayed offline | Testing without live shadow infrastructure |
| Component | Responsibility | |-----------|---------------| | Traffic splitter | Duplicates requests to shadow system | | Shadow router | Forwards mirrored requests, manages timeouts | | Response comparator | Compares prod vs shadow responses | | Discrepancy logger | Records differences with full context | | Metrics collector | Tracks match rates, latency, error rates | | Kill switch | Disables shadow traffic instantly if issues arise |
| Topology | How it works | Trade-offs | |----------|-------------|------------| | Proxy-based | Load balancer or API gateway mirrors requests | Simple setup, adds proxy hop | | Application-level | Application code sends async copy of request | Fine-grained control, code coupling | | Infrastructure-level | Service mesh (Istio, Linkerd) mirrors traffic | No code changes, requires mesh | | Log replay | Capture request logs, replay against shadow | No live infrastructure needed, not real-time |
Configure the load balancer or API gateway to:
Compare responses field by field with configurable rules:
| Field type | Comparison approach | |-----------|-------------------| | IDs, timestamps | Ignore (expected to differ) | | Computed values | Compare within tolerance (e.g., floating point) | | Collections | Compare as sets (ignore ordering unless significant) | | Status codes | Exact match required | | Error responses | Categorize and compare error types | | Headers | Compare relevant headers only (Content-Type, Cache-Control) |
Shadow mode works best with read-only requests. For stateful (write) requests:
| Approach | Description | |----------|------------| | Skip writes | Only mirror read requests to shadow | | Isolated state | Shadow has its own database seeded from production | | Dry-run writes | Shadow validates the write but does not persist | | Record-only | Log what shadow would have written, compare intent |
| Phase | Traffic % | Duration | Goal | |-------|-----------|----------|------| | 1. Smoke test | 1% | Hours | Verify shadow receives and processes requests | | 2. Canary | 5-10% | Days | Identify obvious discrepancies | | 3. Validation | 25-50% | Days-weeks | Build confidence in match rate | | 4. Full mirror | 100% | Days-weeks | Final validation before cutover |
| Metric | Target | Description | |--------|--------|-------------| | Response match rate | > 99.9% | Percentage of identical responses | | Shadow latency (P50) | Within 2x of prod | Shadow performance baseline | | Shadow latency (P99) | Monitored | Tail latency under real load | | Shadow error rate | < prod error rate | Shadow should not produce more errors | | Shadow availability | Monitored | Shadow uptime (not a blocker) | | Discrepancy categories | Trending to zero | Known differences resolved over time |
| Pitfall | Mitigation | |---------|-----------| | Shadow affects production performance | Async mirroring, independent timeouts, kill switch | | Shadow writes to shared resources | Isolate shadow databases, queues, and external services | | Non-deterministic responses cause false mismatches | Configure comparison rules to ignore timestamps, IDs, nonces | | Shadow receives stale data | Seed shadow database from recent production snapshot | | Traffic amplification overwhelms shadow | Use sampled mirroring, auto-scaling, or circuit breakers | | Request ordering differs between prod and shadow | Compare request-by-request, not sequence-dependent | | Authentication tokens expire for shadow | Mint shadow-specific tokens or bypass auth in shadow |
Shadow mode and dual write are complementary migration techniques:
| Migration phase | Technique | Purpose | |----------------|-----------|---------| | Early validation | Shadow mode (reads) | Verify the new system returns correct responses | | Data sync | Dual write | Keep both stores authoritative during transition | | Pre-cutover | Both simultaneously | Shadow validates reads, dual write maintains data | | Cutover | Dual write reversal | New system becomes primary, old becomes secondary | | Post-cutover | Shadow mode (reversed) | Mirror to old system to verify nothing broke |
Both patterns are tactics within the broader Strangler Fig migration strategy:
Shadow mode must have an immediate disable mechanism:
| Context | Approach | |---------|----------| | Architecture review | Verify shadow isolation (no shared writes), kill switch exists | | Code review | Check async mirroring does not block production path | | Implementation | Start with proxy-based mirroring at 1%, increase gradually | | Testing | Verify kill switch works, confirm production is unaffected when shadow fails |
| Term | Definition | |------|-----------| | Shadow system | The new system receiving mirrored traffic | | Production system | The live system serving real users | | Traffic splitter | Component that duplicates requests | | Match rate | Percentage of shadow responses matching production | | Kill switch | Mechanism to instantly disable shadow traffic | | Dark launching | Synonym for shadow mode — feature is live but invisible to users | | Canary traffic | Small percentage of mirrored requests for initial validation | | Strangler fig | Broader migration strategy of incrementally replacing components |
testing
Verify accumulated bug claims at upstream HEAD and dedup against trackers before filing issues. Use when filing upstream reports from backlogs, audit docs, or git-history findings.
documentation
Gate outward-bound text (upstream issues, docs, PR bodies) through isolated haiku fresh-reader critique before publishing. Use when an artifact must survive a reader with zero project context.
tools
Suggest improvements to SKILL.md content, descriptions, or tool config from eval results. Use when raising pass rates, fixing triggering, or iterating on a skill after evaluation.
tools
deadbranch CLI for stale-branch cleanup — dry-run preview, TUI or non-interactive delete, protects main/develop/WIP. Use when asked to clean up branches, prune branches, or remove stale branches.