skills/output-critic/SKILL.md
Evaluate every produced output (code, report, plan, data, API response) against type-specific quality criteria, score 1-10, make accept/reject decisions, and provide actionable improvement suggestions. Triggers on 'evaluate', 'check', 'review', 'quality control', 'is this good enough', 'score it', or before passing output to the next step in an agentic workflow.
npx skillsauth add fatih-developer/fth-skills output-criticInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Evaluate the output by type, score each criterion, make an accept/reject decision, and suggest concrete improvements. Goal: prevent weak output from reaching the next step.
1. Detect output type
2. Apply type-specific criteria
3. Score each criterion
4. Calculate overall score
5. Make accept / conditional / reject decision
6. Suggest improvements
| Overall Score | Decision | Action | |---|---|---| | 8-10 | ACCEPT | Proceed | | 6-7 | CONDITIONAL | Apply minor fixes, then proceed | | 0-5 | REJECT | Apply improvements, re-evaluate |
| Criterion | Weight | Question | |-----------|--------|----------| | Correctness | 30% | Produces expected output? Handles edge cases? | | Readability | 20% | Meaningful names? Clean indentation? | | Security | 20% | SQL injection? Hardcoded secrets? Unsafe input? | | Performance | 15% | Unnecessary loops? N+1 queries? Memory leaks? | | Testability | 15% | Functions independently testable? |
| Criterion | Weight | Question | |-----------|--------|----------| | Accuracy | 30% | Claims supported? Misleading statements? | | Coverage | 25% | All requested topics addressed? Missing sections? | | Clarity | 20% | Target audience can understand? Jargon explained? | | Structure | 15% | Logical flow? Consistent headings? | | Actionability | 10% | Reader knows what to do next? |
| Criterion | Weight | Question | |-----------|--------|----------| | Completeness | 30% | All necessary steps present? Critical step missing? | | Atomicity | 25% | Each step does one thing? Overly broad steps? | | Dependency accuracy | 20% | Order makes sense? Circular dependencies? | | Verifiability | 15% | Each step has clear "done" criteria? | | Realism | 10% | Steps are achievable? Overly optimistic estimates? |
| Criterion | Weight | Question | |-----------|--------|----------| | Accuracy | 35% | Numbers consistent? Calculations correct? | | Completeness | 25% | Missing rows/columns? Nulls explained? | | Format consistency | 20% | Units, date formats, currency consistent? | | Readability | 20% | Meaningful headers? Proper sorting? |
OUTPUT CRITIC
Type : [output type]
Decision : ACCEPT / CONDITIONAL / REJECT
Score : [X/10]
## Criterion Scores
| Criterion | Score | Note |
|-----------|-------|------|
| [Criterion 1] | X/10 | [short note] |
| [Criterion 2] | X/10 | [short note] |
| **Overall** | **X/10** | |
## Strengths
- [What was done well — specific]
## Weaknesses
- [What is missing / wrong — specific]
## Improvement Suggestions
1. [Concrete action — what to do, where]
2. [Concrete action]
## Next Step
[Accept -> proceed | Conditional -> fix X | Reject -> apply suggestions, resubmit]
When improved output is resubmitted:
RE-EVALUATION
Previous score: X/10
New score : Y/10
Change : +N points
Improved : [which criteria]
Still open : [remaining issues if any]
task-decomposer (validates plan quality), output-critic is the quality gate before task completion.tools
Create, optimize, critique, and structure prompts for AI systems. Use this skill whenever the user is designing or improving a prompt, system prompt, coding prompt, image prompt, evaluation rubric, agent prompt, workflow prompt, or MCP-oriented prompt package. Also use it when the user asks to turn vague AI behavior into a precise instruction set, tool policy, agent spec, or prompt architecture.
testing
Assumption-first architecture review skill to stress-test project plans and expose hidden risks.
testing
Enforce and manage DESIGN.md specifications, extract design systems from URLs, and combine design reasoning with token roles to prevent drift.
testing
Forces the agent to act with a Claude-like product mindset, prioritizing user journey, UX states, and visual quality before coding.