skills/prompt-evaluator/SKILL.md
Evaluate and score user-written LLM prompts on a 100-point scale across 5 axes (Clarity, Structure, Information Content, Specificity, Context), providing specific improvement suggestions and a revised prompt. Make sure to use this skill whenever the user asks to evaluate, review, score, or improve a prompt, or when they say things like 'このプロンプトどう?', 'プロンプトを評価して', 'rate my prompt', 'review this prompt', or 'is this prompt good enough?'. This skill focuses on scoring existing prompts, not writing new ones from scratch.
npx skillsauth add hrdtbs/agent-skills prompt-evaluatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Evaluate LLM prompts on a 100-point scale based on research findings from Thorgeirsson et al. (2026), which demonstrated that writing quality—specifically coherence, instructional clarity, and information content—significantly predicts LLM-assisted programming performance.
| # | Axis | Points | Focus | |---|------|--------|-------| | 1 | Clarity (明確性) | 20 | Unambiguous intent, no unclear references | | 2 | Structure (構造) | 20 | Logical organization, appropriate segmentation | | 3 | Information Content (情報量) | 20 | Sufficient detail for task completion | | 4 | Specificity (特定性) | 20 | Concrete requirements, constraints, formats | | 5 | Context (文脈提供) | 20 | Background, audience, purpose clearly stated |
Use this exact template because a consistent format helps the user easily compare evaluations and understand the scoring breakdown:
## プロンプト評価結果 / Prompt Evaluation
### 対象プロンプト
> [quote the evaluated prompt here]
### スコア
| 軸 / Axis | スコア | 主な所見 |
|-----------|--------|----------|
| 明確性 (Clarity) | __/20 | ... |
| 構造 (Structure) | __/20 | ... |
| 情報量 (Info Content) | __/20 | ... |
| 特定性 (Specificity) | __/20 | ... |
| 文脈提供 (Context) | __/20 | ... |
| **合計 / Total** | **__/100** | |
### 評価の概要
[1-2 paragraph summary of strengths and weaknesses]
### 改善提案
1. [specific, actionable suggestion]
2. [specific, actionable suggestion]
...
### 改善版プロンプト
[the improved prompt in a code block]
testing
Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.
testing
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
testing
Apply prompt engineering best practices to write, refine, and optimize system prompts, user prompts, and agent instructions. Use this skill whenever the user wants to write a prompt, optimize an existing prompt for better results, fix a prompt that is hallucinating or underperforming, or structure prompts for Large Language Models (LLMs). Even if the user just says "help me write instructions for my agent", trigger this skill.
testing
Self-evaluate a plan on a 100-point scale after it is created or updated. Make sure to use this skill immediately whenever you create a plan or update a plan, even if the user does not explicitly ask for a review. This skill ensures that the plan is clear, comprehensive, feasible, and consistent before execution.