aops-core/skills/qa/SKILL.md
QA verification, qualitative assessment, criteria design, and test planning
npx skillsauth add nicsuzor/academicops qaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every feature exists for a reason. That reason is expressed practically as user stories: someone needs something, and the feature is supposed to deliver it. QA answers one question: is this feature actually achieving its goals and serving the people it was built for?
This applies whether the feature is a UI dashboard, a gate in a hook pipeline, a batch processing script, an API endpoint, or a skill definition. The evidence might come from:
QA is not a checklist. It is a judgment call: does this work serve the people it was made for? The agent's job is to figure out what evidence is needed, gather it, and evaluate honestly.
/qa # Quick verification of current work
/qa Verify the authentication feature # Specific feature verification
/qa Analyze enforcer gate effectiveness # Operational effectiveness analysis
/qa Design QA criteria for the new epic # Upstream criteria design
When verifying completed work, apply this protocol before declaring anything done.
Default assumption: IT'S BROKEN. You must PROVE it works, not confirm it works.
Triple-Check Protocol (for every claim):
Dimension 1 — Output Quality: Does the result match what was specified?
| Check | Question | | ------------- | -------------------------------------- | | Completeness | Are all required elements present? | | Correctness | Do outputs match spec requirements? | | Format | Does output follow expected structure? | | Working state | Does code run without errors? |
Dimension 2 — Process Compliance: Did the work follow required workflow?
| Check | Question | | --------------- | -------------------------------------- | | Workflow used | Was the correct workflow applied? | | Steps completed | Were all TodoWrite items addressed? | | Tests run | If code changed, were tests executed? | | No scope drift | Did work stay within original request? |
Dimension 3 — Semantic Correctness: Does the result make sense for its purpose?
| Check | Question |
| ------------------ | ---------------------------------------------- |
| Content sensible | Does the output make logical sense? |
| No placeholders | No {variable}, TODO, FIXME in production |
| No garbage data | Content is real, not template artifacts |
| Useful to consumer | Would the intended user find this useful? |
Any of these require immediate investigation:
{variable}, TODO, FIXME)## QA Verification Report
**Verdict**: VERIFIED / ISSUES
### Verification Summary
- Output Quality: PASS / FAIL
- Process Compliance: PASS / FAIL
- Semantic Correctness: PASS / FAIL
[If ISSUES: list each finding with Dimension, Severity (Critical/Major/Minor), and Fix]
These references provide detailed guidance for specific QA activities. Read the ones relevant to your task — you don't need all of them for every QA invocation.
| Reference | When useful | | ---------------------------------------- | ------------------------------------------------------------ | | [[references/qa-planning.md]] | Designing acceptance criteria or QA plans before development | | [[references/qualitative-assessment.md]] | Evaluating fitness-for-purpose after development | | [[references/acceptance-testing.md]] | Running structured test plans, tracking failures | | [[references/quick-verification.md]] | Pre-completion sanity checks | | [[references/integration-validation.md]] | Verifying structural/framework changes | | [[references/system-design-qa.md]] | Designing QA infrastructure for a project | | [[references/visual-analysis.md]] | UI changes or visual artifacts | | [[../eval/references/dimensions.md]] | Agent session performance evaluation |
When delegating to a QA subagent:
Agent(subagent_type="aops-core:qa", model="opus", prompt="
[What you need evaluated and why]
**User story / goal**: [What this feature is supposed to achieve]
**Evidence available**: [Where to find data — logs, transcripts, browser, tests, etc.]
**Acceptance criteria**: [If known — extract from task or spec]
Evaluate fitness-for-purpose. Cite specific evidence. Report honestly.
")
Preserve qualitative framing. The delegation prompt determines output quality. Never reframe QA as pass/fail or checklist compliance — this causes the agent to regress to mechanical evaluation. The prompt must ask for judgment, not tallying.
Anti-pattern: "Check each user story and report pass/fail" → produces DOM element counting, loses all interpretive value.
Good pattern: "Evaluate fitness-for-purpose. Is this serving the user it was built for? Cite evidence." → produces genuine qualitative assessment.
For features with data pipelines (dashboards, transcripts, reports, generated artifacts), explicitly instruct the agent to trace the pipeline, not just inspect output:
Agent(subagent_type="aops-core:qa", model="opus", prompt="
Qualitative assessment of [FEATURE] against user stories in [SPEC].
For each section: trace the data pipeline from source to output.
1. Verify data freshness, not just existence. Check updates over time for real-time displays.
2. Explicitly test fallback chains. Disable them and verify the primary source works independently.
3. Verify during an active session (real runtime state).
4. Identify design-level findings: if data is misleading or UX doesn't serve its purpose, report it.
Evaluate fitness-for-purpose. Cite specific evidence. Report honestly.
")
For agent session evaluation, extract sessions first:
cd "$AOPS"
PYTHONPATH=aops-core uv run python \
aops-core/skills/eval/scripts/prepare_evaluation.py \
--recent 10 --pretty
Evidence storage for evaluations:
$ACA_DATA/eval/
├── YYYY-MM-DD-<session-id>.md # Individual session evaluations
├── trends/
│ └── YYYY-MM-DD-batch.md # Batch trend reports
└── insights/
└── YYYY-MM-DD-<topic>.md # Cross-cutting quality insights
When invoked as /qa with no arguments, do a quick verification of the current session's work:
complete_task()post_qa_trigger() detects QA invocationtools
Streamlit implementation of the analyst presentation layer. Use when building or updating a Streamlit dashboard that displays pre-computed research data. This is the Streamlit-specific HOW for the tech-agnostic principles in the aops-tools analyst skill — display only, never transform.
tools
Python plotting and statistical-modelling libraries (matplotlib, seaborn, statsmodels) for the analyst presentation and statistical-methodology layers. Use when producing publication-quality figures or fitting statistical models in Python. Library-specific HOW for the tech-agnostic principles in the aops-tools analyst skill.
tools
dbt (data build tool) implementation of the analyst transformation layer. Use when a project has a dbt/ directory or you need to build, test, or document SQL transformations as version-controlled, reproducible dbt models. This is the dbt-specific HOW for the tech-agnostic principles in the aops-tools analyst skill.
development
Core academicOps skill — institutional memory, strategic coordination, workflow routing, and framework governance. Merges butler (chief-of-staff) with framework development conventions.