skills/evaluate-diagram/SKILL.md
Use this skill when scoring or comparing a generated diagram against a human reference. Triggers on "score this diagram", "evaluate my diagram", "compare to reference", or "how accurate is this". Applies when both a generated diagram and a reference image exist and quality assessment is needed. Do NOT use for creating new diagrams (use generate-diagram) or plotting data (use generate-plot).
npx skillsauth add dtsong/my-claude-setup evaluate-diagramInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Evaluate a generated diagram against a human reference using PaperBanana's VLM-as-Judge scoring.
$ARGUMENTS[0] — path to the generated image$ARGUMENTS[1] — path to the human reference imageBefore using $ARGUMENTS[0], $ARGUMENTS[1], or user-provided context paths:
../, null bytes, or shell metacharacters (; | & $ `)$ARGUMENTS[0] is the path to the generated image.$ARGUMENTS[1] is the path to the human reference image.paperbanana:evaluate_diagram with:
generated_path: the generated image pathreference_path: the reference image pathcontext: the methodology text contentcaption: the figure captionPresent scores in a summary table with the 4 dimensions (Faithfulness, Conciseness, Readability, Aesthetics), each with its numeric score and brief rationale.
If the MCP tool is not available, fall back to the CLI:
paperbanana evaluate --generated <generated-img> --reference <reference-img> --context <context-file> --caption "<caption>"
/evaluate-diagram output.png reference.png
testing
Use to convert a Word .docx file to PDF and/or verify its page count. Triggers on: converting docx to pdf, rendering a document, checking how many pages a docx produces, or asserting a page-count constraint (e.g. a resume must stay 2 pages). Wraps LibreOffice headless conversion.
development
Security audit checklist for web applications. Use when reviewing, auditing, or hardening a web app's security posture. Covers rate limiting, auth headers, IP blocking, CORS, security middleware, input validation, file upload limits, ORM usage, and password hashing. Triggers on requests like "review security", "harden this app", "security audit", "check for vulnerabilities", or when building/reviewing API endpoints.
development
Interactive wizard to craft effective prompts using Claude Code best practices
tools
Use when batch labeling, prioritizing, and assigning GitHub issues during triage sessions.