skills/bkit-evals/SKILL.md
Run skill evals via evals/runner.js — wrapper validates skill names, captures stdout/stderr, persists JSON results. Triggers: bkit evals, evals run, skill quality, eval runner, 스킬 평가, 評価実行, 评估运行, evaluación, évaluation.
npx skillsauth add popup-studio-ai/bkit-claude-code bkit-evalsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
v2.1.11 Sprint β FR-β2. Wraps
evals/runner.jswith input validation, result persistence, and structured reporting. Replaces the barenode evals/runner.js <skill>invocation that previously required users to remember argv structure and ignored timeout / sandbox concerns.
| Argument | Description | Example |
|----------|-------------|---------|
| run <skill> | Execute the eval suite for one skill | /bkit-evals run gap-detector |
| list | List all skills that have an eval.yaml definition | /bkit-evals list |
If no argument is provided, render the same output as list.
run <skill>skill against /^[a-z][a-z0-9-]{0,63}$/. Reject anything else
(no shell metacharacters, no slashes, no spaces) — see Security below.node evals/runner.js --skill <skill> via child_process.spawnSync
(argv form, no shell). Default timeout 30 s, max 120 s. The --skill flag
form is mandated by the runner CLI and locked by L3 contract test.parsed === null and stdout includes
Usage:, return reason: 'argv_format_mismatch'; if parsed === null
otherwise, return reason: 'parsed_null'. Exit code 0 alone NEVER
implies success — the parsed JSON must be present..bkit/runtime/evals-{skill}-{ISO timestamp}.json with stdout/stderr
tails (2000 chars each), parsed payload, and reason field.listevals/config.json to enumerate skill classifications.workflow, capability, hybrid),
list skills that have evals/{classification}/{skill}/eval.yaml.description field if present).[a-z][a-z0-9-]{0,63} is rejected with reason: invalid_skill_name.| Module | Function | Usage |
|--------|----------|-------|
| lib/evals/runner-wrapper.js | invokeEvals(skill, opts) | Validate + spawn + persist |
| lib/evals/runner-wrapper.js | isValidSkillName(name) | Regex pre-check shared with list |
| evals/runner.js | (subprocess) | Existing eval execution engine |
.bkit/runtime/evals-{skill}-{timestamp}.json:
{
"skill": "gap-detector",
"invokedAt": "<ISO 8601>",
"exitCode": 0,
"timedOut": false,
"stdoutTail": "...",
"stderrTail": "...",
"parsed": { /* whatever runner.js prints as JSON, or null */ }
}
# Single eval
/bkit-evals run gap-detector
# Discovery
/bkit-evals list
/control trust — eval results contribute to trust score/code-review — uses eval data when assessing skills/bkit explore (FR-β1) — explore evals as a categoryARGUMENTS:
testing
Sprint Management — generic sprint capability for ANY bkit user. 16 sub-actions: init, start, status, watch, phase, iterate, qa, report, archive, list, feature, pause, resume, fork, help, master-plan. Triggers: sprint, sprint start, sprint init, sprint status, sprint list, 스프린트, 스프린트 시작, 스프린트 상태, スプリント, スプリント開始, スプリント状態, 冲刺, 冲刺开始, 冲刺状态, sprint, iniciar sprint, estado sprint, sprint, demarrer sprint, statut sprint, Sprint, Sprint starten, Sprint Status, sprint, avviare sprint, stato sprint, master plan, multi-sprint plan, sprint master plan, 마스터 플랜, 멀티 스프린트 계획, 스프린트 마스터 플랜, マスタープラン, マルチスプリント計画, スプリントマスタープラン, 主计划, 多冲刺计划, 冲刺主计划, plan maestro, plan multi-sprint, plan maestro sprint, plan maître, plan multi-sprint, plan maître sprint, Masterplan, Multi-Sprint-Plan, Sprint-Masterplan, piano principale, piano multi-sprint, piano principale sprint.
tools
CC CLI version upgrade impact analysis — research changes, analyze bkit impact, generate report. Triggers: cc-version-analysis, CC upgrade, version analysis, CC 버전 분석, 버전 영향.
testing
Manage PDCA checkpoints and rollback — create, list, restore for safe recovery. Rollback events are recorded via lib/audit/audit-logger ACTION_TYPES.rollback_executed. For sprint-level recovery, individual feature rollbacks may be triggered from within sprint phases (sprint itself is forward-only — terminal state is `archived`, not rolled back; v2.1.13). Triggers: rollback, checkpoint, restore, undo, 롤백, 체크포인트, 복원.
testing
QA Phase execution — L1-L5 test planning, generation, execution, and reporting for a single feature. For sprint-level QA (7-Layer dataFlowIntegrity / S1 gate across multiple features) use /sprint qa <sprintId> which delegates to sprint-qa-flow agent (v2.1.13). Triggers: qa phase, QA test, qa run, QA 실행, QAフェーズ, QA阶段, fase QA, phase QA, QA-Phase, fase QA.