skills/skill-evaluator/SKILL.md
Use when asked to review a skill's quality, test whether a skill works correctly, find why a skill behaves inconsistently or fails to trigger, check if a skill is ready to publish, harden a skill against known failure modes, or turn an observed failure into a repeatable test case.
npx skillsauth add aiden0z/skills skill-evaluatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
IRON LAW: Evaluate a skill against real-use failure modes, not against how convincing its instructions sound.
Assess whether a skill is discoverable, concise, executable, verifiable, and robust under realistic agent behavior. Turn observed failures into concrete workflow gates, scripts, references, or eval cases.
Scope the Evaluation ⚠️ REQUIRED
SKILL.md, relevant references/scripts, prior agent transcript, output artifacts, or failing behavior.Run Mechanical Checks ⛔ BLOCKING
scripts/check_skill_quality.py <skill-path> and capture the full output (status, error count, warning count, metrics).not-ready.Evaluate Design Fit ⚠️ REQUIRED
references/best-practices.md.references/skill-prompt-quality.md to review SKILL.md as executable agent instructions; treat it as a front gate, not proof of runtime compliance.references/evaluation-rubric.md to score trigger quality, workflow control, progressive disclosure, deterministic resources, validation integrity, and output evidence.Design or Review Eval Cases ⚠️ REQUIRED
references/eval-case-design.md.references/harness-engineering.md.references/failure-regression.md first and convert the failure into a no-leak regression case.scripts/check_eval_cases.py <evals.json> when eval cases exist.Forward-Test When Useful
references/forward-testing.md.references/agent-runtime-discovery.md and optionally scripts/discover_agent_runtime.py to discover available agent runtimes and capture channels.references/harness-engineering.md → "Evidence Capture Discovery" to record the chosen evidence level. Codex JSONL is a reference implementation, not a hard dependency.Use $<skill-name> at <path> to handle: <realistic user request>.Recommend or Apply Changes
SKILL.md under 500 lines; move detailed standards into directly linked reference files.For a review-only task, return these five sections in order. Omit any section only when the evaluation did not reach that phase:
scripts/check_skill_quality.py (status, error count, warning count, key metrics). If this section is absent, the evaluation is incomplete and the verdict must be not-ready.ready, usable-with-gaps, or not-ready.For an edit task, also include changed files and verification commands.
scripts/check_skill_quality.py — deterministic skill structure and hygiene checks. Pass --receipt <path> to write a timestamped JSON receipt after a passing run; use this as evidence that mechanical checks completed before a verdict.scripts/check_eval_cases.py — validate portable eval case JSON files.scripts/discover_agent_runtime.py — read-only discovery of local agent CLIs and likely evidence capture methods.references/best-practices.md — distilled guidance from OpenAI, Claude, Trae, and local skill-forge/skill-creator principles.references/skill-prompt-quality.md — static gate for whether SKILL.md can reliably steer agent behavior.references/evaluation-rubric.md — scoring rubric and pass/fail gates.references/eval-case-design.md — how to create regression and forward-test cases.references/agent-runtime-discovery.md — how to discover capture channels for Codex, Claude Code, GitHub Copilot, Kimi, and custom agents.references/harness-engineering.md — design repeatable eval harnesses: datasets, traces, graders, aggregation, and gates.references/forward-testing.md — skill-creator style fresh-agent testing protocol.references/failure-regression.md — convert observed failures into no-leak regression cases and skill changes.templates/eval-case.json — starter schema for skill eval cases.templates/harness-plan.md — lightweight plan for repeatable skill eval suites.development
Use when asked to find Bugs, audit or review a repository, scan code for security/reliability/architecture risks, inspect a folder of many repos, produce evidence-backed Bug reports, continue a prior audit, or compare/triage candidate findings.
development
Vibe Deck — vibe-code professional slide presentations — describe what you want, AI builds it. Scaffolds a React + ECharts project, creates slides with charts, animations, theming, and PDF export. Use PROACTIVELY when the user mentions slides, deck, presentation, PPT, PPTX, slideshow, keynote, pitch deck, quarterly review, board meeting, investor update, sales deck, training deck, onboarding slides, report presentation, add a slide, build a deck, create slides, make a roadmap slide, put this data into a presentation, turn this Excel into slides, visualize this data as a deck. Also trigger when the user wants to modify, reorder, or delete slides in an existing slide-kit project. Also trigger when the user wants to share, export, or package the deck as a single HTML file for email or offline viewing. Chinese triggers: 做PPT, 做个deck, 写pptx, 创建演示, 制作幻灯片, 做幻灯片, 加一页, 新增slide, 做演示文稿, 工作汇报, 述职报告, 季度回顾, 方案展示, 写个汇报, 改一下这页, 调整幻灯片顺序, 删掉这页, 把数据做成图表展示, 帮我做个路线图, 导出单个HTML, 分享给别人看.
development
Generate Outlook-compatible email templates (EML + HTML) through conversation. Three modes: Design (create from scratch), Import (replicate an existing .eml), Production (fill Excel data into a crystallized template). Use when user wants to: create or design an email template, generate an .eml file, make a newsletter, format an email for Outlook, import/replicate an email (导入/复刻邮件), design a 邮件模板, do 邮件排版 or 邮件设计, build pixel-perfect HTML email with Outlook compatibility. Triggers: weekly report email (周报邮件), product update email, event invitation (活动邀请邮件), announcement (公告邮件), company newsletter, .eml import, replicate email template, or make an email look professional/beautiful for Outlook. Handles visual design, EML generation, and EML import — not SMTP, sending, or account management. Without this skill, Outlook emails break because Outlook uses Word rendering which ignores modern CSS.
testing
Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".