skills/nasa-2017-systems-engineering-handbook/SKILL.md
Apply NASA's battle-tested framework for building complex, mission-critical systems where failure has catastrophic consequences
npx skillsauth add curiositech/windags-skills nasa-2017-systems-engineering-handbookInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
license: Apache-2.0
nasa-systems-engineeringLoad this skill when facing:
Not appropriate for: Simple projects with single owners, purely exploratory R&D with no integration requirements, or contexts where "move fast and break things" is actually the right strategy.
Systems engineering is nested problem-solving loops, not linear stages. Each subsystem undergoes the full cycle:
Stakeholder Expectations → Requirements → Design → Verification
↓
Baselined derived requirements become high-level requirements for next level down
↓
Repeat until atomic components, then integrate upward with validation at each seam
Why this matters: Prevents "correct parts, broken system" failures. Each decomposition level must prove it satisfies its parent's requirements before integration. Failures at Level N-1 force re-verification at Level N.
Key insight: The engine is hierarchical and iterative — you're never "done" with requirements or design, you're managing their evolution across levels.
Tailoring is not "skipping steps" — it's documented flexibility based on 8 contextual factors:
| Factor | Examples | |--------|----------| | Mission type | Class A (flagship) → Class D (acceptable mission risk) | | Acceptable risk | Loss of life vs. loss of mission vs. degraded performance | | National significance | International treaty obligations, national security | | Complexity | Component count, novel technology, interface density | | Mission lifetime | 90 days vs. 15 years in space | | Cost constraint | $10M CubeSat vs. $10B telescope | | Launch constraints | No refurbishment possible (deep space) vs. ISS servicing | | Human rating | Crew safety requirements |
Process: Use the Compliance Matrix to document what's eliminated, what's scaled, and why. This makes deviations traceable and auditable.
Anti-dogma: "Maximum rigor" is not "best practice" — over-engineering a CubeSat wastes resources; under-engineering a crewed mission kills people.
CM is not administrative tracking — it's the control system that prevents divergence between:
The failure mode CM prevents: Subsystems drift from their baseline specifications through undocumented changes, then fail at integration because they no longer satisfy the interface contracts they were verified against.
Implementation: Configuration Items (CIs), baselines, change control boards, version-controlled interface control documents (ICDs).
The paradox: Every subsystem can be independently correct yet the system fails when integrated.
Root cause: Interfaces have three failure modes:
Solution framework:
Decisions under uncertainty require structured comparison methods, not intuition:
Key discipline: Document assumptions, weightings, and decision rationale before committing resources. Enables post-decision learning when assumptions prove wrong.
IF the mission is:
THEN consider:
BUT never eliminate:
Document in Compliance Matrix: What you eliminated, why these factors justified it, what alternative controls remain.
IF you can't answer these 5 questions about a requirement, it's not ready:
THEN rewrite using the requirement discipline:
Example transformation:
IF any of these conditions exist:
THEN immediately:
Why urgency matters: Interface defects compound exponentially — catching at integration costs 10x more than at design, catching at operations costs 100x more.
Use Analytic Hierarchy Process (AHP) when:
3 competing architectures
Process:
Decision rule: If rank order is stable across sensitivity ranges, choose top-ranked. If unstable, either:
Establish baseline when:
Between baselines: All changes flow through Configuration Control Board (CCB).
CCB approval required for changes affecting:
Fast-track allowed for: Documentation corrections, non-interface internal changes, pre-approved modifications
| File | When to Load | Contents |
|------|--------------|----------|
| references/recursive-decomposition-engine.md | When structuring problem breakdown, setting up integration strategy, or explaining why requirements keep changing | The 4-phase SE Engine architecture, how decomposition levels interact, validation at integration seams, common breakdown patterns |
| references/tailoring-as-governance.md | When deciding process rigor for a new project, justifying deviation from standard practices, or creating a Compliance Matrix | 8 tailoring factors, 3 tailoring modes, mission classification system, documentation requirements for justified deviations |
| references/configuration-management-as-cognitive-infrastructure.md | When setting up CM for a multi-team project, investigating version drift issues, or deciding what requires CCB approval | Configuration Items, baseline types, change control process, relationship to interface management, failure modes CM prevents |
| references/interface-management-prevents-integration-failures.md | When defining subsystem boundaries, writing ICDs, setting up ICWGs, or debugging integration failures | Interface failure taxonomy, ICD requirements, version control procedures, integration testing strategy, ICWG operating model |
| references/risk-informed-decision-making.md | When choosing between options under uncertainty, assessing technical risks, or defending a difficult decision | PRA methodology, AHP process, trade study structure, assumption documentation, sensitivity analysis techniques |
Loading strategy: Start with recursive-decomposition-engine.md for structural problems, tailoring-as-governance.md for scoping decisions, or interface-management-prevents-integration-failures.md for integration issues.
Failure mode: Both teams make incompatible assumptions, discover at integration, require re-design/re-verification.
Cost multiplier: 10-100x vs. defining interfaces at decomposition.
Correct approach: ICDs signed off by both sides before detailed design begins.
Failure mode: Obvious to Team A ≠ obvious to Team B. Unwritten assumptions propagate as hidden dependencies.
Example: "Data will be transmitted" — obvious to software team means TCP/IP packets, obvious to hardware team means RS-422 serial.
Correct approach: If it affects interfaces or verification, it's a requirement. Write it, trace it, verify it.
False dichotomy: Recursive decomposition ≠ waterfall. You iterate, but each level stabilizes before spawning the next.
The trap: Changing Level N requirements after Level N-1 design begins forces rework that cascades downward.
Correct approach: Agility within levels (sprint on design options), stability across levels (freeze requirements before decomposition).
Reality inversion: CM prevents the slowdown of debugging why Component A mysteriously stopped working (someone changed the interface without documenting it).
Cost: Tracking changes in real-time < investigating failures from uncontrolled changes after integration.
Correct approach: Lightweight CM (version control + change log) for small teams, formal CCB for multi-team programs.
Governance failure: Undocumented tailoring = untraceability. When something fails, you can't reconstruct what controls were bypassed.
Audit risk: "Why didn't you do X?" "It seemed unnecessary" is not a defensible answer for a mishap investigation.
Correct approach: Compliance Matrix documents what you're eliminating, why the 8 factors justify it, what alternative controls remain.
Conceptual error:
Failure scenario: System perfectly verifies against requirements but fails validation because requirements misunderstood stakeholder needs.
Correct approach: Validate requirements with stakeholders before design begins. Verify design against requirements before integration.
"We do systems engineering — we have requirements documents."
"Our requirements are verified against parent requirements and traced to verification methods in our V&V matrix. When a derived requirement changes, we assess re-verification impact on the parent."
Distinguisher: Understands requirements exist in a hierarchy with bidirectional traceability.
"We tailor our process to move faster."
"We tailored by eliminating peer reviews for Class D subsystems with <$5M cost and <1 year lifetime, documented in our Compliance Matrix, but kept interface reviews because we're integrating with partner hardware."
Distinguisher: Can articulate which factors justified which tailoring decisions and what controls remain.
"We use version control for our code."
"We have four configuration baselines: functional (requirements), allocated (design), product (as-built), and operational (as-flown). Our CCB approves changes affecting external interfaces or verified requirements."
Distinguisher: Understands CM spans requirements, design, hardware, and operations — not just source code.
"We write interface specifications."
"Our ICDs specify message formats, timing tolerances, error handling, and environmental assumptions. They're version-controlled, both sides sign off before detailed design, and our ICWG meets biweekly to process change requests."
Distinguisher: Treats interfaces as contracts requiring bilateral agreement and change management.
"We do risk management."
"We ran PRA with fault trees for the propulsion system, AHP for architecture selection weighted toward operability over cost, and tracked via risk matrices with retire-by dates tied to verification events."
Distinguisher: Can name specific methods used for specific risk types and how they tied to decisions.
"We verify our design meets requirements."
"We validated requirements with stakeholders before PDR, verified subsystems against allocated requirements before integration, then validated integrated system performance against Concept of Operations in operational environment."
Distinguisher: Distinguishes verification (requirements compliance) from validation (stakeholder needs satisfaction) and sequences them correctly.
Starting a new complex project?
tailoring-as-governance.md → classify mission, fill out Compliance Matrixrecursive-decomposition-engine.md → map decomposition levels, identify integration pointsinterface-management-prevents-integration-failures.md → define subsystem boundaries, start ICDsDebugging an integration failure?
interface-management-prevents-integration-failures.md → work through failure taxonomyFacing a difficult decision under uncertainty?
risk-informed-decision-making.md → select appropriate method (PRA/AHP/trade study)Requirements keep changing and breaking downstream work?
recursive-decomposition-engine.md → review decomposition timing and baseline disciplineThis skill synthesizes 60+ years of NASA's operational experience building systems where failure means catastrophe. Apply with appropriate tailoring based on your mission context.
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.