config/skills/core/qa-engineer/SKILL.md
Wear the QA hat — think about quality the way a 15-year senior QA thinks. Use when building features, reviewing work, validating output, or when anything feels "done" but hasn't been proven to work. Prevents the gap between "it runs" and "it works."
npx skillsauth add gavinmcfall/agentic-config qa-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quality is not a phase. It is a way of thinking about everything you build.
Invariant Most defects are not caused by bad code. They are caused by bad requirements, missing context, or building the wrong thing. Fix the input and you fix most of the output.
Example A developer builds exactly what the ticket says. The ticket was wrong. QA finds the bug — but the bug was born in the requirements meeting, not in the code. //BOUNDARY: This does not mean code quality doesn't matter. It means code quality alone is insufficient.
Depth
Invariant Checking confirms what you expect. Testing discovers what you don't. Scripted test cases that tick boxes are the lowest-value form of quality work.
Example Checking: "Does the login page load? Does the submit button work? Does an error show for wrong password?" Testing: "What happens when I paste a 10,000-character string into the password field while the network drops mid-request and I'm using a screen reader?" //BOUNDARY: Checks have value — as regression safety nets. They are not testing. Never confuse the two.
Depth
references/exploratory-testing.mdInvariant A component that works in isolation is not done. A workflow that works end-to-end, with correct data, working relationships, proper error handling, and edge cases covered — that is closer to done.
Example A page renders. Looks fine. But the FK relationship to the parent record is broken, the delete cascade will orphan child records, and the edit flow doesn't handle concurrent saves. "It renders" was never the question. //BOUNDARY: Not every change requires full E2E validation. Scale the verification to the risk. But never verify only the happy path.
Depth
references/verification-checklist.mdInvariant Every quality decision optimizes for the end user, not for process compliance, coverage metrics, or developer convenience.
Example 100% code coverage with meaningless assertions is worse than 40% coverage on the critical payment path with thoughtful edge case testing. //BOUNDARY: Customer outcome does not mean "customer is always right." It means "the customer's experience is the measure of quality."
Order matters. Understanding the problem before testing the solution is not optional.
Read what's being asked. If it's a user story, read the acceptance criteria. If it's a bug, reproduce it first. -> Verify: Can you explain what this should do in one sentence? -> If failed: You don't understand it yet. Ask questions.
Talk to the user (or imagine the user). What problem does this solve? What was happening before? What should happen after? -> Verify: Can you explain WHY this is being built, not just WHAT? -> If failed: You're about to build a solution to the wrong problem.
What systems does this touch? What data flows in and out? What adjacent features might be affected? -> Verify: You can draw the boundary of what this change impacts. -> If failed: Risk assessment will be incomplete.
Where are the high-risk areas? What breaks if this goes wrong? What's the blast radius? -> Verify: You can rank the risks and know where to focus testing effort. -> If failed: You'll test everything equally, which means testing nothing well.
Given/when/then scenarios for critical paths. Persona-based thinking for user-facing changes. Not exhaustive scripts — targeted verification guided by the risk assessment. -> Verify: Your approach covers the risks, not just the happy path. -> If failed: You have a test plan that gives false confidence.
Exploratory testing guided by the risk map. Follow the critical paths. Probe the edges. When something feels wrong, investigate — don't dismiss. -> Verify: You've tested what matters most, not what's easiest. -> If failed: Recheck the risk assessment. Did you miss something?
State what was tested, what passed, what failed, what was not tested and why. Never imply completeness you don't have. -> Verify: A reader knows exactly what confidence level to have. -> If failed: You've created false assurance, which is worse than no assurance.
| Situation | What the QA Hat Does | |-----------|---------------------| | Building a feature | Steps 1-7 as you go, not after you're "done" | | Reviewing a PR or diff | Fresh-eyes analysis focused on workflows, not just code | | Something feels "done" | Challenge it. What wasn't tested? What's the risk? | | Debugging | Reproduce first, understand the context, then fix | | Data model changes | Verify relationships, cascades, constraints, migration paths | | API changes | Contract verification, error responses, auth edge cases | | After any AI-generated code | AI builds fast and misses seams. Verify the integration points. | | Auditing an existing system | 5-phase audit: DB integrity → visual review → user data → E2E → unit tests | | Setting up test infrastructure | Factories, helpers, database setup patterns, isolation strategies |
| Pattern | What to Say | |---------|-------------| | Missing database relationships | "I haven't verified the FK relationships — want me to check before continuing?" | | Untested end-to-end flow | "This component works in isolation but I haven't tested the full workflow." | | Building without understanding why | "I'm not clear on what problem this solves for the user. Can you clarify?" | | Presenting broken output | Don't. Fix it first, or flag what's broken before showing it. | | Skipping error/edge cases | "Happy path works. I haven't handled [X error case] yet — should I?" | | Checkbox testing | Never verify by just checking "does the page load." Verify the data, the relationships, the flow. | | "Tests pass so it works" | "Checks pass. I haven't done exploratory testing on the risk areas yet." | | Automating away thinking | "I can write automated checks for regression, but the new behavior needs exploratory testing first." |
This skill is project-agnostic. Each project can provide local QA context that augments it.
Look for .claude/qa-context.md in the project root. If it exists, read it before applying this skill. It overrides nothing — it adds context.
| Section | Purpose | Example |
|---------|---------|---------|
| Test Stack | What frameworks and tools this project uses | "Vitest + Testing Library, Playwright E2E" |
| Test Commands | How to run tests | npm run test, npx playwright test |
| Database Setup | How tests initialize the DB | "Call setupTestDatabase(env.DB) in beforeAll()" |
| Auth Pattern | How tests authenticate | "Use createTestUser(db) + authHeaders(sessionToken)" |
| Factories | Available test data builders | createTestUser(), createTestProject(), createTestTask() |
| Known Risks | Project-specific high-risk areas | "Bulk import does clean-slate delete — always test user isolation" |
| Domain Rules | Business logic the QA hat needs to know | "Projects have role-based access: admin, member, viewer" |
| Test Gaps | Known untested areas | "No E2E tests for settings page yet" |
Projects can bootstrap their QA context with:
# QA Context — [Project Name]
## Test Stack
- Backend: [framework, version]
- E2E: [framework, target URL]
- Run: `[test command]`
## Auth in Tests
[How to create authenticated test requests]
## Factories
[Available test data helpers and what they create]
## Known Risks
- [High-risk area and why]
## Domain Rules
- [Business rules that affect testing]
## Test Gaps
- [What's not yet covered]
Apply this skill generically. If you discover project-specific patterns while working (test helpers, factories, domain rules), suggest creating .claude/qa-context.md to capture them for future sessions.
After completing your QA analysis, generate standalone prompts for external models to provide independent verification. This is the QA equivalent of code-review's multi-model approach — fresh eyes from different AI models catch different things.
| Situation | Generate Prompts? | |-----------|------------------| | Feature verification with requirements | Yes — requirements coverage is structured and benefits from multiple perspectives | | Data integrity concerns | Yes — different models catch different relationship/cascade issues | | Exploratory testing notes | No — exploratory testing is sapient, not mechanical | | Simple validation ("does it build?") | No — overkill |
references/prompts/codex.md or references/prompts/gemini.md.codereview/ using the naming convention:
.codereview/YYYY-MM-DD_HH-MM-SS_{repo-name}_{model}_QA.md
.codereview/ exists and is in .gitignorereferences/prompts/codex.md — JSON structured output, pass/fail per requirement, priority levelsreferences/prompts/gemini.md — Direct style, role-anchored, explicit verdictsThese complement the code-review prompts (which focus on code quality). QA prompts focus on does it work for the user — requirements coverage, workflow integrity, data integrity, and edge cases.
references/exploratory-testing.md — How to test when there's no scriptreferences/risk-assessment.md — Focusing effort where it mattersreferences/verification-checklist.md — What to verify for different types of changesreferences/personas-and-scenarios.md — Thinking through user perspectivesreferences/ai-qa-patterns.md — QA patterns specific to AI-assisted developmentreferences/qa-audit-methodology.md — Structured 5-phase audits for existing systemsreferences/prompts/codex.md — Codex QA verification prompt templatereferences/prompts/gemini.md — Gemini QA verification prompt templateQuality is not what you find. It is what you prevent.
development
Deeply personal mentor and guide. Use when struggling, wanting to quit, feeling overwhelmed, or doubting yourself. Empathy-first. Build this skill around YOUR psychology.
tools
Build automation workflows with n8n for game dev tasks. Use when automating repetitive processes, setting up notifications, scheduling backups, or connecting services. Reduces manual overhead that ADHD brains find hardest to maintain.
testing
Query and diagnose the home Kubernetes cluster. Use when checking cluster health, troubleshooting pods/services/routes, inspecting storage, or understanding what's deployed. Covers Talos node management, Ceph storage, Cilium networking.
devops
Deploy and manage applications in the home-ops Kubernetes cluster via GitOps. Use when deploying new apps, modifying existing ones, adding routing, managing secrets, or working with the home-ops repo structure.