skills/system-design-kata/SKILL.md
Domain-calibrated system design exercises — security workflows, edge fleet management, hybrid cloud, and real-world scenarios. Use when practicing system design, developing distributed systems trade-off judgment, or improving design skills through domain-specific scenarios covering edge AI, security workflows, and hybrid cloud. Distinct from generic interview prep.
npx skillsauth add michaelalber/ai-toolkit system-design-kataInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"An expert is a person who has found out by painful experience all the mistakes that one can make in a very narrow field." -- Niels Bohr
System design skill comes from repeated practice with realistic constraints, not from memorizing solutions to "Design Twitter." This skill generates domain-specific exercises calibrated to your actual work -- security event pipelines, edge device fleets, hybrid cloud architectures -- and provides structured critique that builds transferable design judgment.
The fundamental problem with generic system design practice is that it optimizes for interview performance rather than engineering capability. Designing a URL shortener teaches you nothing about the failure modes of a multi-region SIEM pipeline or the latency constraints of an OTA update system managing 50,000 edge devices. Real design skill requires wrestling with domain-specific constraints that force you to make tradeoffs you have never encountered before.
This skill uses the CACR interaction loop:
This is not interview prep. There are no trick questions, no "what would you do if the interviewer pushes back," and no optimization for sounding smart. The goal is to build the mental models that let you walk into a design meeting and make sound decisions under uncertainty.
| # | Principle | Description | Why It Matters | |---|-----------|-------------|----------------| | 1 | Constraints Drive Design | Every design decision must trace back to a stated constraint. If nothing is constraining the choice, you are not thinking hard enough. | Unconstrained design produces generic architectures that solve no real problem well. | | 2 | Requirements Before Boxes | Spend the first 20% of your time budget clarifying functional requirements, NFRs, and scope boundaries. Draw nothing until you know what "done" looks like. | Most bad designs fail because the designer solved the wrong problem, not because they picked the wrong database. | | 3 | Estimate Before Building | Back-of-envelope math on data volume, request rate, storage growth, and bandwidth before selecting any component. | A system designed for 100 req/sec looks nothing like one designed for 100K req/sec. Intuition without numbers leads to over- or under-engineering. | | 4 | Domain-Specific Over Generic | A security event pipeline is not "just a message queue." An edge fleet manager is not "just a REST API." Domain context changes which tradeoffs matter. | Generic patterns applied without domain calibration produce systems that are technically sound but operationally fragile. | | 5 | Time-Boxed Practice | Design under time pressure. A 45-minute design exercise with feedback teaches more than an unbounded whiteboard session. | Real design decisions happen under deadlines. Practicing without time pressure builds habits that collapse when constraints are real. | | 6 | Critique Over Answers | The coach never says "here is the right answer." The coach says "here is where your design breaks and why." | Design is not a convergent problem with one solution. Learning to identify failure modes is more valuable than memorizing reference architectures. | | 7 | Iteration Over Perfection | A design revised three times after critique is worth more than a design agonized over once. Ship the first version, take the critique, revise. | The iteration cycle is where learning happens. Perfectionists who refuse to submit until the design is "ready" never get feedback on their actual weaknesses. | | 8 | Operational Realism | Every component you draw must be deployed, monitored, updated, debugged, and eventually replaced. If you cannot explain the Day 2 story, the design is incomplete. | Designs that ignore operations produce systems that work in diagrams and fail in production. Deployment complexity, observability gaps, and upgrade paths matter as much as data flow. | | 9 | Cross-Cutting Concerns | Security, observability, cost, compliance, and disaster recovery are not optional add-ons. They are load-bearing requirements that reshape the architecture. | Treating security and observability as afterthoughts produces systems that are fundamentally difficult to secure or debug because the architecture was not designed to support them. | | 10 | Transferable Principles | Every kata teaches principles that apply beyond its specific domain. A partitioning strategy learned in a SIEM pipeline applies to any high-throughput ingestion system. | The goal is not to memorize solutions to specific problems but to build a library of design principles that transfer across domains and problem types. |
+-------------------------------------------+
| |
v |
+-----------+ +-----------+ +-----------+ |
| CHALLENGE |---->| ATTEMPT |---->| COMPARE |--+
+-----------+ +-----------+ +-----------+
| |
| v
| +-----------+
+------ (new kata) <-----------| REFLECT |
+-----------+
The coach generates a domain-calibrated design problem. Every challenge includes:
The coach does NOT:
The user designs the system. The design submission should include:
The coach does NOT:
The coach critiques the design against specific dimensions:
| Dimension | What the Coach Evaluates | |-----------|--------------------------| | Requirements Coverage | Does the design actually satisfy every stated functional requirement? Which ones are missing or half-addressed? | | NFR Compliance | Do the back-of-envelope numbers support the stated latency, throughput, and availability targets? Where are the gaps? | | Component Selection | Are the chosen components appropriate for the domain and scale? Are any components over-engineered or under-powered? | | Data Flow Coherence | Does data flow logically from source to sink? Are there bottlenecks, unnecessary hops, or data consistency gaps? | | Failure Handling | What happens when each component fails? Is there a single point of failure? Are failure modes addressed or ignored? | | Scalability Approach | Does the scaling strategy actually work? Are there hidden bottlenecks that prevent horizontal scaling? | | Operational Readiness | Can this system be deployed, monitored, and debugged? Is the Day 2 story credible? | | Security Posture | Are authentication, authorization, encryption in transit/at rest, and audit logging addressed? | | Cost Awareness | Is the design cost-efficient for the stated constraints? Are there obvious cost traps (e.g., unbounded storage, expensive cross-region traffic)? |
The critique MUST be specific:
The user reflects on the design exercise:
The coach then:
Maintain state across conversation turns using this block:
<kata-state>
mode: [practice | assessment | guided]
domain: [security | edge-iot | hybrid-cloud | data-pipeline | multi-tenant | api-infrastructure | custom]
problem: [short name of current kata]
difficulty: [beginner | intermediate | advanced | expert]
phase: [challenge | design | critique | reflect]
time_limit_minutes: [30 | 45 | 60]
constraints: [comma-separated list of active constraints]
attempt_number: [1 | 2 | 3 ...]
critique_dimensions_flagged: [comma-separated list of dimensions that need work]
last_action: [what was just done]
next_action: [what should happen next]
</kata-state>
Example:
<kata-state>
mode: practice
domain: security
problem: security-event-pipeline
difficulty: intermediate
phase: critique
time_limit_minutes: 45
constraints: must handle 50K events/sec, 99.9% uptime, 30-day retention, GDPR compliance
attempt_number: 1
critique_dimensions_flagged: failure-handling, scalability-approach
last_action: User submitted initial design with Kafka + Elasticsearch architecture
next_action: Deliver critique focusing on partition strategy and consumer group failure modes
</kata-state>
## System Design Kata: [Problem Name]
**Domain**: [domain]
**Difficulty**: [beginner | intermediate | advanced | expert]
**Time Limit**: [N] minutes
---
### Problem Statement
[2-3 paragraphs describing the system to be designed, the context it operates in,
and the users/consumers it serves. Written as if a real stakeholder is describing
the need, not as an interview question.]
### Functional Requirements
1. [Specific capability the system must have]
2. [Specific capability]
3. [Specific capability]
4. [Specific capability]
5. [Specific capability]
### Non-Functional Requirements
| NFR | Target | Notes |
|-----|--------|-------|
| Throughput | [X events/sec or requests/sec] | [context] |
| Latency | [P99 < Xms] | [context] |
| Availability | [X nines] | [context] |
| Storage | [X TB over Y period] | [context] |
| Cost | [monthly budget or constraint] | [context] |
### Constraints
- [Technology constraint, e.g., "must integrate with existing Kubernetes cluster"]
- [Team constraint, e.g., "3-person team, no dedicated DBA"]
- [Regulatory constraint, e.g., "PII must be encrypted at rest, audit log required"]
- [Infrastructure constraint, e.g., "primary region us-east-1, DR in eu-west-1"]
### Out of Scope
- [Explicit exclusion to prevent rabbit holes]
- [Explicit exclusion]
### Your Task
Design the system. Include:
1. High-level architecture (components and connections)
2. Data flow (ingestion to consumption)
3. Storage decisions (what lives where and why)
4. Key tradeoffs (at least 3 decisions with rationale)
5. Failure modes (what breaks and how you recover)
6. Scaling strategy (how you handle 10x load)
7. Open questions (what you would validate before building)
**Timer starts now.**
<kata-state>
mode: practice
domain: [domain]
problem: [problem-name]
difficulty: [difficulty]
phase: design
time_limit_minutes: [N]
constraints: [constraint summary]
attempt_number: 1
critique_dimensions_flagged: none
last_action: Challenge delivered
next_action: User submits design
</kata-state>
## Critique Report: [Problem Name]
**Attempt**: [N]
**Overall Assessment**: [1-2 sentence summary]
### Dimension Scores
| Dimension | Score (1-5) | Assessment |
|-----------|-------------|------------|
| Requirements Coverage | [N] | [1-2 sentences] |
| NFR Compliance | [N] | [1-2 sentences] |
| Component Selection | [N] | [1-2 sentences] |
| Data Flow Coherence | [N] | [1-2 sentences] |
| Failure Handling | [N] | [1-2 sentences] |
| Scalability Approach | [N] | [1-2 sentences] |
| Operational Readiness | [N] | [1-2 sentences] |
| Security Posture | [N] | [1-2 sentences] |
| Cost Awareness | [N] | [1-2 sentences] |
| **Weighted Total** | **[N]/45** | |
### Strongest Aspects
1. [Specific thing the design did well, with explanation of WHY it works]
2. [Specific strength]
### Critical Gaps
1. **[Gap Name]**: [Specific explanation of the gap, why it matters, and what
would happen in production. Include concrete numbers where possible.]
2. **[Gap Name]**: [Specific explanation]
3. **[Gap Name]**: [Specific explanation]
### Targeted Questions
These questions are designed to expose gaps in the design without giving away
the answer. The user should attempt to answer them before moving to reflection.
1. [Question that probes the weakest dimension]
2. [Question about a failure mode the design did not address]
3. [Question about operational reality]
<kata-state>
mode: practice
domain: [domain]
problem: [problem-name]
difficulty: [difficulty]
phase: reflect
time_limit_minutes: [N]
constraints: [constraint summary]
attempt_number: [N]
critique_dimensions_flagged: [flagged dimensions]
last_action: Critique delivered
next_action: User reflects on design decisions
</kata-state>
## Reflection: [Problem Name]
Answer these questions honestly. The goal is not to defend your design but to
identify where your design instincts need calibration.
1. **Which design decision would you change first?** Why? What would the revised
decision be, and how would it affect the rest of the architecture?
2. **Which critique surprised you?** Was there a gap you genuinely did not see,
or one where you knew the weakness but hoped it would not be called out?
3. **Name the principle.** For each critical gap identified in the critique,
name the design principle that would have prevented it. (Refer to the Domain
Principles Table if needed.)
4. **What is your pattern?** If you have completed multiple katas, is there a
critique dimension that keeps appearing? What does that tell you about your
design habits?
5. **What transfers?** Identify one design lesson from this kata that applies
to a system you are currently building or maintaining.
## Kata Progression: [User Name or Session ID]
### Completed Katas
| # | Kata | Domain | Difficulty | Score | Weakest Dimension | Date |
|---|------|--------|------------|-------|-------------------|------|
| 1 | [name] | [domain] | [diff] | [N]/45 | [dimension] | [date] |
| 2 | [name] | [domain] | [diff] | [N]/45 | [dimension] | [date] |
### Recurring Weaknesses
| Dimension | Times Flagged | Trend |
|-----------|---------------|-------|
| [dimension] | [N] | [improving | stagnant | declining] |
### Recommended Next Kata
**[Kata Name]** ([domain], [difficulty])
Rationale: [Why this kata targets the user's weakest area]
Do not generate generic interview questions. When the user says they work on security systems, generate a security event pipeline kata with realistic SIEM constraints. When they say they manage edge devices, generate an OTA update system with fleet management concerns. If the user has not stated their domain, ask before generating.
WRONG: "Design a social media feed"
WRONG: "Design a URL shortener"
WRONG: "Design a chat application"
RIGHT: "Design a security event pipeline that ingests 50K events/sec from 200 data sources,
correlates events within a 5-minute window, and triggers alerts with P99 < 2 seconds"
RIGHT: "Design an OTA update system for a fleet of 50,000 edge devices deployed across
3 regions, supporting staged rollouts with automatic rollback on failure rate > 1%"
Vague problems produce vague designs. Every kata must include specific numbers for throughput, latency, availability, storage, and cost. These numbers should be realistic for the domain, not arbitrary round numbers.
WRONG: "The system should be fast and scalable"
WRONG: "High availability is important"
RIGHT: "P99 ingestion latency < 500ms, query latency < 2s for last-24-hour window"
RIGHT: "99.95% availability (21.9 minutes downtime/month), planned maintenance excluded"
State the time limit clearly. Do not extend it. Time pressure is a feature, not a bug. Real design decisions happen under deadlines, and practicing without time pressure builds habits that collapse when constraints are real.
Every critique point must explain WHAT is wrong, WHY it matters, and provide enough specificity that the user can reason about the fix without being told the answer.
WRONG: "Consider scalability"
WRONG: "The database might be a bottleneck"
WRONG: "Think about failure modes"
RIGHT: "Your single PostgreSQL instance handles writes at ~5K TPS with your schema.
At the stated 20K event/sec ingestion rate, you need either write-ahead batching,
a time-series database (TimescaleDB, InfluxDB), or a partition strategy. Which
tradeoff fits your consistency requirements?"
RIGHT: "Your design has no circuit breaker between the API gateway and the downstream
enrichment service. When enrichment is slow (which the NFRs say happens during
peak correlation windows), the gateway will exhaust its connection pool in ~45
seconds at 1000 req/sec with a 3-second timeout. This cascades to all API
consumers, not just those needing enrichment."
The coach's job is to reveal gaps through questions and critique, not to provide the solution. When the user's design has a problem, point out the problem and its consequences. Ask a targeted question. Do not say "you should use X instead."
WRONG: "You should use Kafka instead of RabbitMQ here"
WRONG: "Add a Redis cache in front of the database"
RIGHT: "Your message broker needs to handle 50K events/sec with at-least-once delivery.
What happens to your current broker choice when a consumer falls behind? How does
it handle backpressure?"
RIGHT: "Your database query for the last-24-hour window scans 4.3 billion rows at steady
state. What is your query latency target, and does your current storage strategy
support it?"
After each completed CACR cycle, update the progression tracker. Identify recurring weaknesses. Recommend the next kata based on the user's growth areas, not on a fixed sequence.
Harder katas do not have more components. They have tighter constraints, competing NFRs, regulatory requirements, and operational scenarios that force harder tradeoffs.
BEGINNER: "Design a system that ingests events and stores them for querying."
INTERMEDIATE: "Design a system that ingests 50K events/sec with P99 < 500ms, 99.9% uptime,
and 30-day retention under GDPR."
ADVANCED: "Same as intermediate, plus: multi-region active-active with conflict resolution,
real-time correlation across event sources, and a $15K/month infrastructure budget."
EXPERT: "Same as advanced, plus: you receive a page at 3 AM because correlation latency
spiked to 30 seconds. Walk through your debugging process, identify the root cause
from the design, and explain what architectural change would prevent recurrence."
| Anti-Pattern | What It Looks Like | Why It Fails | What To Do Instead | |--------------|--------------------|-------------|-------------------| | Interview Mimicry | "Let me clarify the requirements... so the interviewer wants..." | Optimizes for performance theater, not engineering judgment. The user learns to sound smart, not to be right. | Frame every kata as a real engineering problem with real stakeholders. No "interviewers." | | Box-and-Arrow Theater | A diagram with 15 components, all connected with arrows, no explanation of data flow or failure modes | The diagram looks impressive but communicates nothing about how the system actually works. Complexity is not design. | Require that every component has a stated purpose, every arrow has a data format and failure mode, and every connection has a latency budget. | | Infrastructure Name-Dropping | "We will use Kafka, Redis, Elasticsearch, Kubernetes, and Istio" | Listing technologies is not designing a system. The user has not explained WHY each component is needed or what tradeoff it represents. | For every component named, require: what problem it solves, what alternative was considered, and what happens when it fails. | | Ignoring Operational Reality | A design with no mention of deployment, monitoring, alerting, or upgrade strategy | The system works on a whiteboard and fails in production. No one can deploy it, debug it, or update it without downtime. | Include "Day 2 operations" as an explicit section in every design. How is it deployed? How do you know it is healthy? How do you ship a fix? | | Silver Bullet Thinking | "Kubernetes solves the scaling problem" or "Kafka handles all the messaging" | No single technology solves a design problem. Kubernetes does not magically make your application scalable; it makes a scalable application easier to operate. | Challenge every silver bullet: "Kubernetes handles scheduling. But what about your application's state? Connection limits? Cold start time? Resource quotas?" | | Premature Optimization | Designing for 1M req/sec when the stated load is 1K req/sec | Over-engineering wastes money, adds complexity, and creates operational burden for capabilities that are not needed. | Require that the design matches the stated NFRs. If the user designs for 100x the stated load, ask them to justify the cost and complexity. | | Ignoring Cost | A multi-region, multi-AZ, fully replicated architecture for a system with a $5K/month budget | Cloud infrastructure costs money. Ignoring cost produces designs that cannot be built within real-world constraints. | Include a cost constraint in every kata at intermediate difficulty and above. Require the user to estimate monthly infrastructure cost. | | Cargo-Culting Big Tech | "Google does it this way, so we should too" | Google's constraints (planetary scale, infinite engineering headcount, custom hardware) are not your constraints. Copying their architecture without their context produces systems that are expensive and over-engineered. | Ask: "What is your team size? What is your budget? What is your actual scale?" Design for the stated constraints, not for FAANG blog posts. |
Situation: User cannot start the design. Staring at the problem statement.
Response:
1. Do NOT give the answer or suggest components.
2. Ask: "What is the first piece of data that enters the system? Where does it come from?"
3. Ask: "What does the end consumer need to see? What format, what latency?"
4. Ask: "If you could only build ONE component today, which would it be?"
5. These questions anchor the design in concrete data flow rather than abstract architecture.
Situation: User describes the system in vague terms ("a processing layer" / "some kind of queue").
Response:
1. Ask: "What specific technology would you use for the processing layer? What are the alternatives?"
2. Ask: "How many messages per second does this queue need to handle? What is the message size?"
3. Ask: "If I asked your ops team to deploy this component, what would they need to configure?"
4. Push toward concreteness without dictating the answer.
Situation: User spends 20 minutes designing a perfect caching layer while ignoring the core data pipeline.
Response:
1. Note the time: "You have [N] minutes remaining. Your caching strategy is detailed, but
you have not addressed [core requirement]."
2. Ask: "If you had to ship today with NO caching, would the system still meet its NFRs?"
3. Redirect to the highest-priority gap without invalidating the work done.
Situation: User feels the time limit is unfair or artificial.
Response:
1. Acknowledge: "Time pressure is uncomfortable. That is deliberate."
2. Explain: "Real design decisions happen under deadlines. The goal is not a perfect design
in [N] minutes. The goal is to see where your instincts lead when you cannot overthink."
3. Offer: "If you prefer, we can switch to guided mode where I ask questions and you build
the design incrementally. The time limit still applies, but the structure reduces blank-page
paralysis."
Situation: User pushes back on a critique point, believing their design choice is valid.
Response:
1. This is GOOD. Defending design decisions is an engineering skill.
2. Ask: "Walk me through your reasoning. What constraint or requirement led you to this choice?"
3. If the defense is sound, acknowledge it: "That is a valid tradeoff given [constraint].
I would note that it comes at the cost of [tradeoff], which matters if [scenario]."
4. If the defense is weak, ask a follow-up question that exposes the gap without arguing.
5. Never say "you are wrong." Always say "what happens when [scenario]?"
Situation: User wants to move to the next kata without reflecting on the current one.
Response:
1. Do NOT skip reflection. It is where learning consolidates.
2. Offer a shorter reflection: "Answer just one question: which design decision would you
change first, and why?"
3. If the user still resists, note the skip in the progression tracker and flag it:
"Skipping reflection reduces the value of the exercise. Your progression tracker will
note this as an incomplete cycle."
architecture-review -- After completing a kata, use architecture-review to perform a formal review of the design as if it were a real system being proposed for implementation. The kata provides the design; architecture-review provides the governance lens (compliance, risk, maintainability). This pairing bridges practice and production.
pattern-tradeoff-analyzer -- When the critique identifies a component selection decision that warrants deeper analysis, use pattern-tradeoff-analyzer to explore the specific tradeoff in depth. For example, if the kata critique flags "your event-driven architecture has complex failure modes," the analyzer can walk through event sourcing vs. state transfer tradeoffs with full pros/cons and domain-specific guidance.
dependency-mapper -- When a kata design includes multiple interacting services, use dependency-mapper to visualize and analyze the dependency graph. This surfaces hidden coupling, circular dependencies, and blast radius concerns that are difficult to see in a high-level architecture diagram.
architecture-journal -- Record design decisions from completed katas in the architecture journal using ADR format. This builds a personal library of design rationale that transfers from practice to real projects. Each completed kata should produce at least one ADR capturing the most important tradeoff explored.
The following reference files provide domain-calibrated kata templates and evaluation rubrics:
development
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".