skills/self-eval/SKILL.md
Run the Self-Evaluation Checklist against your last response or recent code changes. Use when the user says "self-eval", "self-evaluate", "check my response", "quality check", "evaluate response", or expresses doubt ("did you actually test that?", "are you sure?", "that seems incomplete"), or proactively before finishing code edits. Cross-references /de-slop and /tdd-assertions for items with dedicated tooling.
npx skillsauth add paulnsorensen/dotfiles self-evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Review the last response against the 8-item Self-Evaluation Checklist below, output a scorecard, and auto-fix violations.
<certain>, <speculative>, or <don't know> (wrapped in backticks so the tag renders as literal text)./de-slop on changed files./tdd-assertions on test code.Determine what to evaluate:
Evaluate all 8 checklist items. Output a compact scorecard:
## Self-Evaluation
| # | Check | Result | Notes |
|---|-------------------|--------|-------|
| 1 | Sycophancy | PASS | |
| 2 | Premature complete | FAIL | Left TODO on line 42 |
| 3 | Dismissing failures| PASS | |
| 4 | Hedging | PASS | |
| 5 | Scope reduction | WARN | Dropped retry logic, acknowledged |
| 6 | False confidence | PASS | |
| 7 | AI slop | DEFER | Running /de-slop |
| 8 | Weak assertions | DEFER | Running /tdd-assertions |
Use PASS, FAIL, WARN (acknowledged deviation), or DEFER (delegating to specialized skill).
/de-slop on the changed files. Mark DEFER until results return, then update to PASS/FAIL./tdd-assertions on test files. Mark DEFER until results return, then update to PASS/FAIL.Only invoke these if the item is relevant — no code changes means items 7-8 are automatic PASS.
For each FAIL:
After fixes, output the updated scorecard with a one-line summary:
tools
Reconstruct what a past coding-agent session was doing so you can resume it — goal, files touched, last verified state, and the next step — by querying the session logs. Use when the user says "what was I working on", "recover that session", "reconstruct where I left off", "resume my last session", "what did that session change", "rebuild context from logs", or invokes /work-recovery. Report-only — it never scores or judges. Do NOT use for usage scoring (that is /skill-improver, /tool-efficiency, /prompt-analytics) or one-off interactive log queries (that is /session-analytics).
development
Curate this repo's hallouminate wiki (.hallouminate/wiki/, the repo:dotfiles:wiki corpus) — add or update architecture pages, per-harness docs, and gotchas. Use when the user says "update the wiki", "document this in the wiki", "refresh the harness docs", "add a wiki page", "curate the wiki", "the wiki is stale", or invokes /wiki-curator. Also use at session end to write back a non-obvious decision or gotcha worth preserving. Grounds the existing wiki first, follows one-topic-per-file conventions, verifies every external doc URL before writing, and reindexes. Do NOT use for general code search (that is cheez-search) or for editing AGENTS.md command reference.
tools
Audit how a tool, command, or MCP server is actually used across coding-agent sessions and produce calibrated recommendations — tool-vs-task fit, error forensics, fix recommendations, permission friction, MCP health, and token economics. Use when the user says "tool efficiency", "am I using X efficiently", "audit tool usage", "why does X keep failing", "how do I fix this error", "what should I change", "permission friction", "is this MCP worth it", "tool error rate", "fix recommendations", or invokes /tool-efficiency. Do NOT use for auditing a skill or agent definition (that is /skill-improver) or for one-off interactive log queries (that is /session-analytics).
tools
Analyze how prompts and skill routing behave across coding-agent sessions and produce calibrated recommendations — prompt-pattern analysis, routing accuracy, and knowledge gaps. Use when the user says "analyze my prompts", "prompt patterns", "is routing working", "which skill should have fired", "knowledge gaps", "what do I keep asking", or invokes /prompt-analytics. Do NOT use for auditing a single skill/agent definition (that is /skill-improver), tool/MCP efficiency (that is /tool-efficiency), or one-off interactive log queries (that is /session-analytics).