skills/agent-reviewer/SKILL.md
After an agentic task completes, perform a retrospective analysis across 6 dimensions (goal alignment, efficiency, decision quality, error handling, communication, reusability). Score performance, identify inefficiency patterns, evaluate skill usage, and produce actionable improvement recommendations. Triggers on 'how did it go', 'retrospective', 'review performance', 'what could be better', or after any long agentic task completes.
npx skillsauth add fatih-developer/fth-skills agent-reviewerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Task is done — now look back. What went well, what went wrong, what should be different next time? Goal: never repeat the same mistake and continuously improve skills and processes.
Core principle: Retrospectives are painful but necessary. A good agent evaluates itself.
Did the result match the original intent?
Did the task take longer than necessary?
Were decisions well-reasoned?
How were errors addressed?
How was user interaction quality?
Can lessons from this task transfer to the next?
| Severity | Meaning | Action | |----------|---------|--------| | CRITICAL | Endangered the task or significantly reduced quality | Must fix | | MODERATE | Created inefficiency but didn't break the result | Improve | | POSITIVE | Something that went better than expected | Repeat, standardize |
AGENT REVIEWER — Task Retrospective
Task : [task name]
Score : X/10
Findings : N critical | N moderate | N positive
## Dimension Scores
| Dimension | Score | Summary |
|-----------|-------|---------|
| Goal Alignment | X/10 | ... |
| Efficiency | X/10 | ... |
| Decision Quality | X/10 | ... |
| Error Handling | X/10 | ... |
| Communication | X/10 | ... |
| Reusability | X/10 | ... |
| **Overall** | **X/10** | |
## Critical Findings
[If any — what happened, why critical, how to prevent]
## Improvement Areas
[Inefficiencies, missed opportunities]
## What Went Well
[Decisions and approaches worth repeating]
## Action Items
### For Next Task
1. [Concrete change — what to do]
2. [Concrete change]
### Skill / Process Improvement
1. [Which skill should be updated / added]
2. [Which pattern should be standardized]
## Lessons Learned
[Items a future agent instance should know — candidates for memory-ledger]
Scan the task history for these patterns:
| Pattern | Symptom | Fix | |---------|---------|-----| | Repeated tool call | Same file/API read 2+ times | Cache it | | Unnecessary confirmation | Low-risk step triggered approval | Adjust checkpoint-guardian threshold | | Late assumption discovery | "Actually it should be..." after error | Trigger assumption-checker earlier | | Sequential parallel steps | Independent steps ran sequentially | Use parallel-planner | | Blind retry | Logic error treated as transient | Fix error-recovery categorization | | Context loss | Previous step info forgotten | Memory-ledger not updated | | Over-decomposition | 2-step task split into 8 | Adjust task-decomposer granularity |
Evaluate skills used during the task:
## Skills Used
| Skill | Used? | Effective? | Notes |
|-------|-------|------------|-------|
| task-decomposer | Yes/No | Good/Fair/Poor | ... |
| checkpoint-guardian | Yes/No | Good/Fair/Poor | ... |
| assumption-checker | Yes/No | Good/Fair/Poor | ... |
| tool-selector | Yes/No | Good/Fair/Poor | ... |
| parallel-planner | Yes/No | Good/Fair/Poor | ... |
| error-recovery | Yes/No | Good/Fair/Poor | ... |
| memory-ledger | Yes/No | Good/Fair/Poor | ... |
| output-critic | Yes/No | Good/Fair/Poor | ... |
Missing / untriggered skills and why?
tools
Create, optimize, critique, and programmatically structure prompts for AI systems. Use this skill whenever the user is designing or improving a static prompt, system prompt, coding prompt, agent prompt, workflow prompt, MCP-oriented prompt package, or an algorithmic prompt optimization pipeline. Also use it when the user asks to turn vague AI behavior into a precise instruction set, tool policy, agent spec, evaluation metric, or prompt architecture.
testing
Assumption-first architecture review skill to stress-test project plans and expose hidden risks.
testing
Enforce and manage DESIGN.md specifications, extract design systems from URLs, and combine design reasoning with token roles to prevent drift.
testing
Forces the agent to act with a Claude-like product mindset, prioritizing user journey, UX states, and visual quality before coding.