eval/SKILL.md
Evaluate the quality of any AI-generated artifact — visualizations, code, documents, conversations, or any skill output. Works in 3 phases: (1) Generate evaluation specs tailored to the artifact type, (2) Run comprehensive evaluation against those specs, (3) Produce a beautiful visual report using the /visualize skill. Use after any skill produces output, or invoke directly with /evaluate <file-or-context>. Supports evaluating: HTML visualizations, code projects, documents, agent conversations, slide decks, dashboards, or any artifact with quality dimensions.
npx skillsauth add careerhackeralex/visualize evaluateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive quality evaluation for any AI-generated artifact. Produces its report as a visualization.
┌──────────────────────────────────────────────┐
│ │
│ Phase 1: SPEC GENERATION │
│ Analyze the artifact type │
│ Generate tailored evaluation criteria │
│ Define scoring dimensions + weights │
│ Set quality gates │
│ │ │
│ ▼ │
│ Phase 2: EVALUATION │
│ Run automated checks (when possible) │
│ Visual/manual inspection │
│ Score each dimension with evidence │
│ Identify systemic vs local issues │
│ │ │
│ ▼ │
│ Phase 3: REPORT (via /visualize) │
│ Generate a beautiful HTML eval report │
│ Scores, charts, screenshots, fix list │
│ Radar chart of dimensions │
│ Before/after tracking │
│ │
└──────────────────────────────────────────────┘
For any artifact, generate evaluation specs by analyzing:
For each artifact type, produce 6-10 evaluation dimensions. Each dimension needs:
Define gates based on the artifact's purpose:
| Gate | Criteria | Meaning | |------|----------|---------| | 🚀 EXCEPTIONAL | Overall ≥ 9.5, all ≥ 9 | Best-in-class. Share everywhere. | | ✅ SHIP | Overall ≥ 9.0, all ≥ 8 | Production-ready. | | ⚠️ ACCEPTABLE | Overall ≥ 8.0, all ≥ 7 | Usable but not impressive. | | 🔧 NEEDS WORK | Overall ≥ 7.0 or any < 7 | Fix before releasing. | | ❌ FAIL | Overall < 7.0 or any < 5 | Major rework. |
Write the spec to eval-spec-[artifact-name].md for reference and reuse.
Open in browser at 3 viewports (1280×720, 768×1024, 375×667).
Automated audit (run in browser console):
(function() {
const audit = {};
const style = [...document.querySelectorAll('style')].map(s => s.textContent).join(' ');
const html = document.documentElement.outerHTML;
// Structure
audit.hasDoctype = /^<!doctype html>/i.test(html);
audit.hasLangAttr = !!document.documentElement.lang;
audit.hasCharset = !!document.querySelector('meta[charset]');
audit.hasViewport = !!document.querySelector('meta[name="viewport"]');
audit.hasTitle = document.title.length > 0;
// Menu system
audit.menuExists = !!document.querySelector('.viz-menu');
audit.menuHasTheme = !!html.match(/cycleTheme|themeLabel/i);
audit.menuHasDownload = !!html.match(/htmlToImage|html-to-image/i);
audit.menuHasPrint = !!html.match(/window\.print/i);
// Theme system
audit.hasCSSVars = !!style.match(/--bg\s*:/);
audit.hasDarkTheme = !!style.match(/(\.theme-dark|:root)[\s\S]*?--bg/);
audit.hasLightTheme = !!style.match(/\.theme-light/);
audit.themePersistedToStorage = !!html.match(/localStorage.*theme/i);
// Typography
audit.hasInterFont = !!html.match(/fonts\.googleapis.*Inter|font-family.*Inter/i);
audit.hasFontFallback = !!style.match(/-apple-system|system-ui/);
audit.bodyFontSize = parseFloat(getComputedStyle(document.body).fontSize);
audit.bodyFontOK = audit.bodyFontSize >= 14;
// Layout
audit.usesFlexOrGrid = !!(style.match(/display\s*:\s*(flex|grid)/));
audit.hasMaxWidth = !!style.match(/max-width/);
audit.hasResponsiveBreakpoints = !!style.match(/@media.*max-width|@media.*min-width|sm:|md:|lg:/);
// Print & Accessibility
audit.hasPrintStyles = !!style.match(/@media\s*print/);
audit.hasPrintColorAdjust = !!style.match(/print-color-adjust/);
audit.hasReducedMotion = !!style.match(/prefers-reduced-motion/);
audit.hasAriaLabels = !!html.match(/aria-label/);
audit.hasSemanticHTML = !!html.match(/<(header|main|nav|section|article|footer)/);
// Animations
audit.hasKeyframes = !!style.match(/@keyframes/);
audit.hasTransitions = !!style.match(/transition\s*:/);
// Performance
audit.fileSizeKB = Math.round(new Blob([html]).size / 1024);
audit.fileSizeOK = audit.fileSizeKB < 200;
audit.noExternalImages = document.querySelectorAll('img[src^="http"]').length === 0;
audit.htmlToImageLoaded = typeof htmlToImage !== 'undefined';
// Summary
const bools = Object.entries(audit).filter(([k,v]) => typeof v === 'boolean');
const passed = bools.filter(([k,v]) => v).length;
audit._passed = passed;
audit._total = bools.length;
audit._percent = Math.round(passed / bools.length * 100);
audit._failures = bools.filter(([k,v]) => !v).map(([k]) => k);
console.table(audit);
return audit;
})();
Visual scoring — 8 dimensions for visualizations:
| # | Dimension | Weight | 10 = | 6 = | |---|-----------|--------|------|------| | D1 | First Impression | 15% | Apple keynote quality | Generic template feel | | D2 | Typography | 15% | Perfect hierarchy, Inter font, fluid sizing | All same size, no hierarchy | | D3 | Color & Contrast | 10% | Harmonious, WCAG AA, both themes beautiful | Clashing, low contrast | | D4 | Layout & Spacing | 15% | Consistent rhythm, responsive, generous space | Cramped, broken at mobile | | D5 | Content Quality | 15% | Clear message in 5 seconds, zero filler | Confusing, placeholder text | | D6 | Interactivity | 10% | Menu + theme + download + print all flawless | Missing features, broken | | D7 | Technical | 10% | Zero errors, semantic, accessible, print-ready | Console errors, broken layout | | D8 | Shareability | 10% | Would tweet this unprompted | Worse than Canva |
Dimensions: Correctness, Readability, Architecture, Error Handling, Performance, Testing, Documentation, Security
Dimensions: Clarity, Structure, Accuracy, Completeness, Tone, Formatting, Actionability, Brevity
Dimensions: Helpfulness, Accuracy, Tone, Efficiency, Safety, Context Awareness, Tool Usage, Follow-through
After scoring, generate the eval report as a beautiful HTML dashboard using the visualize skill:
eval-report-[artifact-name]-[date].html
This is the ultimate dogfood test — our evaluation tool produces evaluations using our visualization tool.
Generate artifact (any skill)
↓
/evaluate → Spec + Score + Visual Report
↓
Review report → identify fixes
↓
Fix (systemic → SKILL.md, local → artifact)
↓
/evaluate again → compare scores
↓
Ship when gate = SHIP or EXCEPTIONAL
Max 3 loops per artifact. If it can't reach SHIP in 3 loops, the problem is in the skill — update the skill's instructions, not the artifact.
# Evaluate a visualization
/evaluate path/to/visualization.html
# Evaluate with custom context
/evaluate path/to/code-project --type code
# Re-evaluate after fixes (tracks improvement)
/evaluate path/to/visualization.html --loop 2
# Generate specs only (no scoring)
/evaluate --specs-only --type dashboard
development
Create beautiful, self-contained HTML visualizations from any content or idea. Use for: slide decks, presentations, infographics, dashboards, flowcharts, diagrams, timelines, comparison tables, data visualizations, landing pages, one-pagers, org charts, mind maps, process flows, kanban boards, report summaries, or any visual that helps humans digest information faster. Trigger on requests like "visualize this," "make a deck," "create a slide," "build an infographic," "show me a dashboard," "make this visual," or any request to present information in a visual HTML format.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.