public/claude/skills/quality/benchmark/SKILL.md
Agent benchmark workflow for comparing token usage, latency, cost, cache behavior, model choice, prompt variants, tool availability, and configuration changes with fixed tasks and measured evidence.
npx skillsauth add jungho-git/jllm benchmarkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this rule when:
Benchmarking should isolate one variable and report measured evidence.
Without a fixed task, token and latency deltas are not meaningful.
Report raw usage before derived cost.
Prefer a small reliable benchmark over broad anecdotal comparison.
Cost claims must be reproducible from reported numbers.
Keep conclusions tied to measured data.
testing
Required phase order for non-trivial tasks: Plan, Explore, Implement, Verify, Finalize. Use for multi-step work, scoped exploration, re-planning, validation, and final synthesis.
development
Final response format: Korean-first, concise Process / Checks / Issues / Updates, optional Usage, with only actual changes, actual validation, real blockers, changed files, and measured token data when available.
development
Smallest complete change rule: preserve local code shape, extend existing patterns, avoid speculative extraction or cleanup, and include required coupled updates for correctness.
development
Code comment policy: numbered one-line `―` dividers for touched declarations and logical sections, paired outer blocks only for long regions, concise purpose comments, and no comment churn.