archived/skills/eval/SKILL.md
Agent session quality assessment — merged into /qa as Agent Session Evaluation mode
npx skillsauth add nicsuzor/academicops evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill is merged into /qa. Use the Agent Session Evaluation mode there.
See: [[../../skills/qa/SKILL.md]]
All evaluation logic, quality dimensions, and the session extraction script are documented in /qa.
The dimensions reference and extraction script remain at their current paths:
skills/eval/scripts/prepare_evaluation.pytools
Program / portfolio supervision — the autonomous top loop above /supervisor. "Ready the release" → discover and decompose the constituent epics → run /supervisor on each → surface only escalations + merge-ready PRs. Stateless tick driven by /loop; all cross-tick state lives in the program task body.
development
Mirror PKB tasks onto the Cowork native task list at claim time and sync completion back to PKB. Cowork-only; ships only in the cowork build of aops-core.
testing
Instruction quality gate — reviews agent instructions (task bodies, workflow steps, skill procedures, self-test protocols) for shallow-execution vulnerabilities before deployment. Two modes: author (pre-hoc review) and audit (trace a failure back to the instruction gap). The bar is excellence, not compliance.
content-media
Design-stage fitness rubric — persona immersion, scenario design, dimensions that define what excellence looks like for the people a feature serves. Two modes — author (produce a rubric for a new spec) and critique (red-team an existing spec). Output lives on the spec, not in the verification brief. Owned by pauli.