Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

openai/plugin-eval

Name: plugin-eval
Author: openai

plugins/plugin-eval/skills/plugin-eval/SKILL.md

npx skillsauth add openai/plugins plugin-eval

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Plugin Eval

Use this as the beginner-friendly umbrella entrypoint for local Codex skill and plugin evaluation.

Start Here

Resolve whether the target path is a skill, a plugin, or another local folder.
Prefer the chat-first router when the user speaks naturally or is not sure which command they need:

plugin-eval start <path> --request "<user request>" --format markdown

Route natural chat requests to the matching workflow:
- "Give me an analysis of the game dev skill." -> resolve the named skill path, run plugin-eval analyze <path> --format markdown, then initialize a benchmark and show the setup questions needed to tailor benchmark.json
- "Evaluate this skill." -> plugin-eval analyze <path> --format markdown
- "Why did this score that way?" -> plugin-eval analyze <path> --format markdown
- "What should I fix first?" -> plugin-eval analyze <path> --format markdown
- "Explain the token budget for this skill." -> plugin-eval explain-budget <path> --format markdown
- "Measure the real token usage of this skill." -> benchmark flow, then plugin-eval measurement-plan
- "Help me benchmark this plugin." -> starter benchmark flow
- "What should I run next?" -> plugin-eval start <path> --request "What should I run next?" --format markdown
If the user wants rewrite help after evaluation, route to ../improve-skill/SKILL.md.
If the user wants a custom rubric, route to ../metric-pack-designer/SKILL.md.
If the user names a skill instead of giving a path, resolve it locally before running commands:
- check ~/.codex/skills/<skill-name> first
- then check any repo-local skills/<skill-name> directory
- if the name is still ambiguous, ask one short clarifying question before continuing
When the request sounds like "analysis" rather than just "evaluate", do the fuller path:
- run the report
- initialize .plugin-eval/benchmark.json
- surface the setup questions that will refine the starter scenarios
- preview the dry-run command the user can execute next

Chat Requests To Recognize

Give me an analysis of the game dev skill.
Evaluate this skill.
Evaluate this plugin.
Why did this score that way?
What should I fix first?
Explain the token budget for this skill.
Measure the real token usage of this skill.
Help me benchmark this plugin.
What should I run next?

Matching Commands

plugin-eval start <path> --request "Evaluate this skill." --format markdown
plugin-eval start <path> --request "Give me a full analysis of this skill, including benchmark setup." --format markdown
plugin-eval analyze <path> --format markdown
plugin-eval explain-budget <path> --format markdown
plugin-eval measurement-plan <path> --format markdown
plugin-eval init-benchmark <path>
plugin-eval benchmark <path> --dry-run
plugin-eval benchmark <path>

Output Expectations

Prefer the JSON result as the source of truth.
Lead with At a Glance, Why It Matters, Fix First, and Recommended Next Step.
Keep the why content terse and easy to skim.
Call out whether budget numbers are static estimates or measured harness results.
Show the user the exact chat phrase they can reuse next, the plugin-eval start command that routes it, and the first local workflow command behind it.
When the user asks for an analysis of a named skill, do not stop at the report if benchmark setup is still missing.
When the user is asking about a skill specifically, hand off to ../evaluate-skill/SKILL.md.
When the user is asking about a plugin bundle, hand off to ../evaluate-plugin/SKILL.md.

References

../../references/chat-first-workflows.md
../../references/technical-design.md
../../references/evaluation-result-schema.md

openai/plugin-eval

plugins/plugin-eval/skills/plugin-eval/SKILL.md

Help engineers evaluate a local skill or plugin, explain why it scored that way, show what to fix first, measure real token usage, benchmark starter scenarios, or decide what to run next. Use when the user says things like "evaluate this skill", "give me an analysis of the game dev skill", "why did this score that way", "what should I fix first", "measure the real token usage of this skill", or "what should I run next?".

994 stars

tools

Updated May 6, 2026

$ install --global

skillsauth

npx skillsauth add openai/plugins plugin-eval

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 6, 2026, 6:10 AM135.7s2 files scanned

SKILL.md

name:: plugin-eval
description:: Help engineers evaluate a local skill or plugin, explain why it scored that way, show what to fix first, measure real token usage, benchmark starter scenarios, or decide what to run next. Use when the user says things like "evaluate this skill", "give me an analysis of the game dev skill", "why did this score that way", "what should I fix first", "measure the real token usage of this skill", or "what should I run next?".

Plugin Eval

Use this as the beginner-friendly umbrella entrypoint for local Codex skill and plugin evaluation.

Start Here

Resolve whether the target path is a skill, a plugin, or another local folder.
Prefer the chat-first router when the user speaks naturally or is not sure which command they need:

plugin-eval start <path> --request "<user request>" --format markdown

Route natural chat requests to the matching workflow:
- "Give me an analysis of the game dev skill." -> resolve the named skill path, run plugin-eval analyze <path> --format markdown, then initialize a benchmark and show the setup questions needed to tailor benchmark.json
- "Evaluate this skill." -> plugin-eval analyze <path> --format markdown
- "Why did this score that way?" -> plugin-eval analyze <path> --format markdown
- "What should I fix first?" -> plugin-eval analyze <path> --format markdown
- "Explain the token budget for this skill." -> plugin-eval explain-budget <path> --format markdown
- "Measure the real token usage of this skill." -> benchmark flow, then plugin-eval measurement-plan
- "Help me benchmark this plugin." -> starter benchmark flow
- "What should I run next?" -> plugin-eval start <path> --request "What should I run next?" --format markdown
If the user wants rewrite help after evaluation, route to ../improve-skill/SKILL.md.
If the user wants a custom rubric, route to ../metric-pack-designer/SKILL.md.
If the user names a skill instead of giving a path, resolve it locally before running commands:
- check ~/.codex/skills/<skill-name> first
- then check any repo-local skills/<skill-name> directory
- if the name is still ambiguous, ask one short clarifying question before continuing
When the request sounds like "analysis" rather than just "evaluate", do the fuller path:
- run the report
- initialize .plugin-eval/benchmark.json
- surface the setup questions that will refine the starter scenarios
- preview the dry-run command the user can execute next

Chat Requests To Recognize

Give me an analysis of the game dev skill.
Evaluate this skill.
Evaluate this plugin.
Why did this score that way?
What should I fix first?
Explain the token budget for this skill.
Measure the real token usage of this skill.
Help me benchmark this plugin.
What should I run next?

Matching Commands

plugin-eval start <path> --request "Evaluate this skill." --format markdown
plugin-eval start <path> --request "Give me a full analysis of this skill, including benchmark setup." --format markdown
plugin-eval analyze <path> --format markdown
plugin-eval explain-budget <path> --format markdown
plugin-eval measurement-plan <path> --format markdown
plugin-eval init-benchmark <path>
plugin-eval benchmark <path> --dry-run
plugin-eval benchmark <path>

Output Expectations

Prefer the JSON result as the source of truth.
Lead with At a Glance, Why It Matters, Fix First, and Recommended Next Step.
Keep the why content terse and easy to skim.
Call out whether budget numbers are static estimates or measured harness results.
Show the user the exact chat phrase they can reuse next, the plugin-eval start command that routes it, and the first local workflow command behind it.
When the user asks for an analysis of a named skill, do not stop at the report if benchmark setup is still missing.
When the user is asking about a skill specifically, hand off to ../evaluate-skill/SKILL.md.
When the user is asking about a plugin bundle, hand off to ../evaluate-plugin/SKILL.md.

References

../../references/chat-first-workflows.md
../../references/technical-design.md
../../references/evaluation-result-schema.md

Related Skills

openai/provision-droplet

development

VerifiedTrustedCommunity

Use when the user wants to spin up / create / launch / provision a DigitalOcean droplet (or "a remote dev box on DO") and connect to it from Codex as a remote SSH workspace.

3,575SKILL.mdUpdated Jun 26, 2026

openai/provision-droplet

openai/teams

data-ai

VerifiedTrustedCommunity

Search through Microsoft Teams chats or channels, triage unread or recent activity, draft follow-ups, and manage Planner tasks through connected Teams data.

3,575SKILL.mdUpdated May 9, 2026

openai/figma-use-motion

tools

VerifiedTrustedCommunity

Motion / animation context for the `use_figma` MCP tool — animating Figma nodes via manual keyframes, animation styles, easing, and timeline duration. Load alongside figma-use whenever a task involves adding, editing, or inspecting animation on a node.

3,511SKILL.mdUpdated Jun 25, 2026

openai/figma-use-motion

openai/figma-swiftui

development

VerifiedTrustedCommunity

SwiftUI ↔ Figma translation. Use whenever the user mentions Swift, SwiftUI, iOS, iPhone, or iPad — in EITHER direction — translating a Figma design into SwiftUI (design → code), or pushing SwiftUI views / screens / tokens back into a Figma file (code → design). Triggers on phrases like 'implement this Figma design in SwiftUI', 'build this screen in Swift', 'push this SwiftUI view to Figma', 'mirror my Swift code in a Figma file', or whenever a Figma URL appears alongside `.swift` files / an `.xcodeproj`. Routes to a direction-specific reference doc; loads alongside `figma-use` for the code → design path.

3,511SKILL.mdUpdated Jun 25, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/openai/plugins.git

# Copy into Claude Code skills folder (global)
cp -r plugins/plugins/plugin-eval/skills/plugin-eval ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

openai/plugins

994 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT