plugins/axiom-sdlc-engineering/skills/quality-assurance/SKILL.md
Use when deciding test strategy, struggling with code reviews, shipping without tests, or conflating verification with validation
npx skillsauth add tachyon-beep/skillpacks quality-assuranceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill implements the Verification (VER) and Validation (VAL) process areas from the CMMI-based SDLC prescription.
Core principle: Verification ≠ Validation. Tests prove you built it correctly (VER). Users prove you built the right thing (VAL). Both required at Level 3.
Critical distinction:
Use this skill when:
Do NOT use for:
| Situation | Primary Reference Sheet | Key Decision | |-----------|------------------------|--------------| | "Skip tests to ship faster?" | Testing Practices | Level 3: Tests required before merge. Exception protocol for emergencies only. | | "Reviews catching nothing" | Peer Reviews | Social dynamics issue, not technical. Psychological safety + reviewer accountability. | | "Tests pass, customers unhappy" | Validation with Stakeholders | VER without VAL. Both required at Level 3. UAT process needed. | | "Same bugs recurring" | Defect Management | Requires RCA (5 Whys, fishbone). Pattern = systemic issue needing process fix. | | "Manual tests take 2 days" | Testing Practices | Ice cream cone anti-pattern. Migrate to test pyramid with economics. |
What: Ensuring product meets specifications and requirements
How: Testing, code review, static analysis, inspections
Who: Development team (internal)
When: Throughout development, before release
Level 3 Requirements:
Example: Unit tests pass, integration tests pass, code reviewed
What: Ensuring product meets user needs and solves actual problems
How: User acceptance testing (UAT), stakeholder demos, beta testing
Who: End users, stakeholders (external to dev team)
When: End of iteration, before production release
Level 3 Requirements:
Example: Users confirm feature solves their problem, stakeholders approve for release
| Scenario | VER | VAL | Outcome | |----------|-----|-----|---------| | Tests pass, users happy | ✅ | ✅ | SUCCESS - Built correctly AND right thing | | Tests pass, users unhappy | ✅ | ❌ | FAILURE - Wrong feature, wrong UX, wrong problem solved | | Tests fail, users would have been happy | ❌ | ✅ | FAILURE - Right idea, poor execution, bugs prevent use | | Tests fail, users would be unhappy | ❌ | ❌ | DISASTER - Wrong thing built poorly |
Level 3 mandate: Both VER and VAL required before production release.
VER Requirements:
VAL Requirements:
Work Products:
Quality Criteria:
Audit Trail:
VER Requirements:
VAL Requirements:
Work Products:
Quality Criteria:
Audit Trail:
Statistical Practices:
Quantitative Work Products:
Quality Criteria:
Audit Trail:
CRITICAL: "Tests later" = tests never (documented historical pattern)
Level 3 projects - Absolute requirements:
Rationale: Risk too high, rework too expensive, reputation damage too severe
When: Production outage, immediate fix needed, no time for full test suite
Level 3 Requirements:
Frequency Limit: >5 TEST-HOTFIXes per month = systemic problem requiring process audit
Violation: Skipping retrospective tests = QA failure, escalate to engineering manager
If absolutely must ship without full coverage:
Retrospective required: Within 7 days, answer "Why no tests?" and address root cause
Enforcement: Violations escalate to engineering manager, process audit if patterns emerge
Detection: Tests written after code (or not at all), "We'll add tests later"
Red Flags:
Why it fails: "Later" never comes, test debt accumulates, bugs reach production, rework costs 10-100x more
Counter: TDD requirement (Level 3 can waive, but must justify). Tests = part of "done", not optional.
Detection: Code reviews <5 minutes, "LGTM" without specific feedback, defects escaping to production
Red Flags:
Why it fails: Social pressure not to block > quality, reviewers fear being "difficult", no accountability
Counter: Review metrics (finding rate should be 20-40%), reviewer accountability, psychological safety
Detection: Mostly manual E2E tests, few unit tests, regression testing takes days
Red Flags:
50% of testing time is manual
Why it fails: Doesn't scale, slow feedback, expensive to maintain, brittle tests
Counter: Test pyramid economics, migration to unit-heavy strategy, ROI calculation
Detection: Same bugs recurring in different places, no pattern analysis, firefighting constantly
Red Flags:
Why it fails: Treats symptoms not causes, waste effort on recurring issues, no learning
Counter: RCA requirement (Level 3 mandatory for recurring defects), defect pattern analysis
Detection: Stakeholders "approve" without actually using system, checkbox exercise
Red Flags:
Why it fails: False confidence, issues found in production, customer dissatisfaction
Counter:
The following reference sheets provide detailed guidance for specific QA domains. Load them on-demand when needed.
When to use: Deciding test strategy, coverage requirements, test pyramid, TDD
→ See testing-practices.md
Covers:
When to use: Code reviews ineffective, rubber-stamp approvals, unclear reviewer responsibilities
→ See peer-reviews.md
Covers:
When to use: Planning UAT, stakeholder acceptance, beta testing, demo preparation
→ See validation-with-stakeholders.md
Covers:
When to use: Bugs recurring, defect triage, root cause analysis, prevention
→ See defect-management.md
Covers:
When to use: Measuring quality effectiveness, tracking improvement, justifying QA investment
→ See qa-metrics.md
Covers:
When to use: Understanding appropriate QA rigor for project tier
→ See level-scaling.md
Covers:
| Mistake | Why It Fails | Better Approach | |---------|--------------|-----------------| | "Tests later" | Later never comes, debt accumulates | Tests = part of "done". Level 3: required before merge. | | "Tests pass = done" | Conflates VER with VAL, skips user acceptance | Both required at Level 3. Tests AND stakeholder approval. | | "LGTM rubber stamps" | Social pressure > quality, reviewers fear blocking | Reviewer accountability, metrics (20-40% finding rate), psychological safety. | | "Automate everything" | Automation has costs (setup, maintenance), not always ROI-positive | Test pyramid economics. Unit tests cheap, E2E expensive. Choose wisely. | | "Manual testing is bad" | Some testing should be manual (exploratory, one-time, usability) | Strategic automation. Critical paths automated, exploratory manual. | | "Skip RCA, fix it quick" | Same bugs recur, waste effort whack-a-mole | Level 3: RCA required for recurring defects. Fix root cause, not symptom. | | "Stakeholder approved" (without using system) | Validation theater, issues found in production | Hands-on UAT required. Stakeholder must actually use feature, not just demo. |
| When You're Doing | Also Use | For |
|-------------------|----------|-----|
| Writing tests for Python code | axiom-python-engineering | pytest-specific patterns and idioms |
| E2E/performance/chaos testing | ordis-quality-engineering | Specialized test strategies |
| Implementing code review process | design-and-build | Code review checklist, CI integration |
| Designing acceptance criteria | requirements-lifecycle | INVEST criteria, user story format |
| Setting up CI for testing | design-and-build | CI/CD pipeline configuration |
Without this skill: Teams experience:
With this skill: Teams achieve:
Remember: Verification proves you built it correctly. Validation proves you built the right thing. You need BOTH.
development
Use when **managing the delivery of work** rather than building it — running a project or a program, not writing its code. Use when a team is busy but outcomes are not landing, when "when will it be done" has no defensible answer, when status is green every week until it is suddenly red, when dependencies surprise you, when a RAID log is a graveyard, or when several projects must be coordinated toward one outcome (a program). Lean/agile-leaning, honest about where program scale needs predictive structure. Pairs with `/axiom-planning` (turning one workstream into an implementation plan) and `/axiom-sdlc-engineering` (process maturity, requirements traceability, formal governance). Do not load for writing code, picking an architecture, or designing a single feature.
tools
--- name: using-product-management description: Use when a Claude is taking **standing ownership** of a software product and driving it end-to-end across many sessions — discovery, strategy, specs, delivery orchestration, and value validation — deciding *what to build, why, for whom,* and *whether it worked*, with continuity, decision provenance, and an authority boundary that escalates anything irreversible or outward-facing to the human owner. Owns the product disciplines: opportunity assessme
tools
Use when designing, implementing, or auditing an MCP (Model Context Protocol) server — tool API design, idempotency under agent retry, structured error envelopes agents can recover from, schema versioning across model drift, transport reliability (stdio / HTTP), output-shape and pagination discipline, and choosing between tools / resources / prompts / sampling. Also use when an MCP server's tools confuse agents, return unstructured errors, deadlock under concurrent calls, double-execute under retry, or lose state across reconnects. Do not use for general REST/GraphQL API design (use `/web-backend`), for client-side prompt engineering or tool-loop design (use `/llm-specialist`), for general in-process plugin architecture (use `/system-architect`), or for cryptographic-provenance audit trails (use `/audit-pipelines`).
development
Use when running **SQLite or DuckDB inside an application process** as the durable store — not as a development convenience but as the production database. Use when scaling an SQLite layer that worked at low concurrency and is now hitting SQLITE_BUSY, WAL bloat, lock contention, schema-migration ceremony, or correctness gaps under multi-process writers. Use when introducing DuckDB as an OLAP complement to an OLTP SQLite store, or when picking between the two for a new component. Pairs with `/web-backend` (the API surface above the DB) and `/audit-pipelines` (when the DB is also the audit trail). Do not load for server databases (Postgres, MySQL), key-value stores, or ORM choice in isolation.