Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

hrdtbs/prompt-evaluator

Name: prompt-evaluator
Author: hrdtbs

skills/prompt-evaluator/SKILL.md

npx skillsauth add hrdtbs/agent-skills prompt-evaluator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Prompt Evaluator

Evaluate LLM prompts on a 100-point scale based on research findings from Thorgeirsson et al. (2026), which demonstrated that writing quality—specifically coherence, instructional clarity, and information content—significantly predicts LLM-assisted programming performance.

Key Research Insights

Information content > vocabulary: Adding missing information improves results; rewording without adding information rarely helps (Lucchetti et al.)
Structure matters: Unorganized, vague prompts lead to failure cycles
Declarative > interrogative: Declarative statements outperform questions (Chen et al.)
Ambiguity kills: Unclear pronouns, implicit assumptions, and missing constraints are top failure causes

Evaluation Workflow

Receive the user's prompt
Read references/evaluation-rubric.md for detailed scoring criteria
Score each of the 5 axes (4 sub-items × 5pt = 20pt per axis, 100pt total)
For common issues, consult references/improvement-patterns.md for Before/After examples
Output the evaluation result using the template below
Provide a revised prompt

5 Evaluation Axes

| # | Axis | Points | Focus | |---|------|--------|-------| | 1 | Clarity (明確性) | 20 | Unambiguous intent, no unclear references | | 2 | Structure (構造) | 20 | Logical organization, appropriate segmentation | | 3 | Information Content (情報量) | 20 | Sufficient detail for task completion | | 4 | Specificity (特定性) | 20 | Concrete requirements, constraints, formats | | 5 | Context (文脈提供) | 20 | Background, audience, purpose clearly stated |

Output Template

Use this exact template because a consistent format helps the user easily compare evaluations and understand the scoring breakdown:

## プロンプト評価結果 / Prompt Evaluation

### 対象プロンプト
> [quote the evaluated prompt here]

### スコア

| 軸 / Axis | スコア | 主な所見 |
|-----------|--------|----------|
| 明確性 (Clarity) | __/20 | ... |
| 構造 (Structure) | __/20 | ... |
| 情報量 (Info Content) | __/20 | ... |
| 特定性 (Specificity) | __/20 | ... |
| 文脈提供 (Context) | __/20 | ... |
| **合計 / Total** | **__/100** | |

### 評価の概要
[1-2 paragraph summary of strengths and weaknesses]

### 改善提案
1. [specific, actionable suggestion]
2. [specific, actionable suggestion]
...

### 改善版プロンプト
[the improved prompt in a code block]

Scoring Guidelines

0pt: Sub-item is completely absent or counterproductive
1pt: Minimal attempt, mostly insufficient
2-3pt: Partially addressed, room for improvement
4pt: Well addressed with minor gaps
5pt: Excellent, no meaningful improvement needed

Language Handling

Evaluate prompts in any language (Japanese, English, etc.)
Output the evaluation in the same language as the user's prompt
Scoring criteria apply universally regardless of language

hrdtbs/prompt-evaluator

skills/prompt-evaluator/SKILL.md

Evaluate and score user-written LLM prompts on a 100-point scale across 5 axes (Clarity, Structure, Information Content, Specificity, Context), providing specific improvement suggestions and a revised prompt. Make sure to use this skill whenever the user asks to evaluate, review, score, or improve a prompt, or when they say things like 'このプロンプトどう？', 'プロンプトを評価して', 'rate my prompt', 'review this prompt', or 'is this prompt good enough?'. This skill focuses on scoring existing prompts, not writing new ones from scratch.

development

Updated May 14, 2026

$ install --global

skillsauth

npx skillsauth add hrdtbs/agent-skills prompt-evaluator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 14, 2026, 2:36 AM208.2s3 files scanned

SKILL.md

name:: prompt-evaluator
description:: Evaluate and score user-written LLM prompts on a 100-point scale across 5 axes (Clarity, Structure, Information Content, Specificity, Context), providing specific improvement suggestions and a revised prompt. Make sure to use this skill whenever the user asks to evaluate, review, score, or improve a prompt, or when they say things like 'このプロンプトどう？', 'プロンプトを評価して', 'rate my prompt', 'review this prompt', or 'is this prompt good enough?'. This skill focuses on scoring existing prompts, not writing new ones from scratch.

Prompt Evaluator

Key Research Insights

Information content > vocabulary: Adding missing information improves results; rewording without adding information rarely helps (Lucchetti et al.)
Structure matters: Unorganized, vague prompts lead to failure cycles
Declarative > interrogative: Declarative statements outperform questions (Chen et al.)
Ambiguity kills: Unclear pronouns, implicit assumptions, and missing constraints are top failure causes

Evaluation Workflow

Receive the user's prompt
Read references/evaluation-rubric.md for detailed scoring criteria
Score each of the 5 axes (4 sub-items × 5pt = 20pt per axis, 100pt total)
For common issues, consult references/improvement-patterns.md for Before/After examples
Output the evaluation result using the template below
Provide a revised prompt

5 Evaluation Axes

Output Template

Use this exact template because a consistent format helps the user easily compare evaluations and understand the scoring breakdown:

## プロンプト評価結果 / Prompt Evaluation

### 対象プロンプト
> [quote the evaluated prompt here]

### スコア

| 軸 / Axis | スコア | 主な所見 |
|-----------|--------|----------|
| 明確性 (Clarity) | __/20 | ... |
| 構造 (Structure) | __/20 | ... |
| 情報量 (Info Content) | __/20 | ... |
| 特定性 (Specificity) | __/20 | ... |
| 文脈提供 (Context) | __/20 | ... |
| **合計 / Total** | **__/100** | |

### 評価の概要
[1-2 paragraph summary of strengths and weaknesses]

### 改善提案
1. [specific, actionable suggestion]
2. [specific, actionable suggestion]
...

### 改善版プロンプト
[the improved prompt in a code block]

Scoring Guidelines

0pt: Sub-item is completely absent or counterproductive
1pt: Minimal attempt, mostly insufficient
2-3pt: Partially addressed, room for improvement
4pt: Well addressed with minor gaps
5pt: Excellent, no meaningful improvement needed

Language Handling

Evaluate prompts in any language (Japanese, English, etc.)
Output the evaluation in the same language as the user's prompt
Scoring criteria apply universally regardless of language

Related Skills

hrdtbs/skill-judge

testing

VerifiedTrustedCommunity

Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.

SKILL.mdUpdated May 14, 2026

hrdtbs/skill-creator

testing

VerifiedTrustedCommunity

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

SKILL.mdUpdated May 14, 2026

hrdtbs/prompt-engineering-expert

testing

VerifiedTrustedCommunity

Apply prompt engineering best practices to write, refine, and optimize system prompts, user prompts, and agent instructions. Use this skill whenever the user wants to write a prompt, optimize an existing prompt for better results, fix a prompt that is hallucinating or underperforming, or structure prompts for Large Language Models (LLMs). Even if the user just says "help me write instructions for my agent", trigger this skill.

SKILL.mdUpdated May 14, 2026

hrdtbs/prompt-engineering-expert

hrdtbs/plan-self-review

testing

VerifiedTrustedCommunity

Self-evaluate a plan on a 100-point scale after it is created or updated. Make sure to use this skill immediately whenever you create a plan or update a plan, even if the user does not explicitly ask for a review. This skill ensures that the plan is clear, comprehensive, feasible, and consistent before execution.

SKILL.mdUpdated May 14, 2026

hrdtbs/plan-self-review

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/hrdtbs/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/prompt-evaluator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

hrdtbs/agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT