Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

popup-studio-ai/bkit-evals

Name: bkit-evals
Author: popup-studio-ai

skills/bkit-evals/SKILL.md

npx skillsauth add popup-studio-ai/bkit-claude-code bkit-evals

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

bkit Evals — Skill Quality Evaluation Runner

v2.1.11 Sprint β FR-β2. Wraps evals/runner.js with input validation, result persistence, and structured reporting. Replaces the bare node evals/runner.js <skill> invocation that previously required users to remember argv structure and ignored timeout / sandbox concerns.

Arguments

| Argument | Description | Example | |----------|-------------|---------| | run <skill> | Execute the eval suite for one skill | /bkit-evals run gap-detector | | list | List all skills that have an eval.yaml definition | /bkit-evals list |

If no argument is provided, render the same output as list.

Behavior

`run <skill>`

Validate skill against /^[a-z][a-z0-9-]{0,63}$/. Reject anything else (no shell metacharacters, no slashes, no spaces) — see Security below.
Spawn node evals/runner.js --skill <skill> via child_process.spawnSync (argv form, no shell). Default timeout 30 s, max 120 s. The --skill flag form is mandated by the runner CLI and locked by L3 contract test.
Capture stdout / stderr. Parse the trailing JSON block via balanced-brace fallback (string-aware).
Apply fail-closed defense: if parsed === null and stdout includes Usage:, return reason: 'argv_format_mismatch'; if parsed === null otherwise, return reason: 'parsed_null'. Exit code 0 alone NEVER implies success — the parsed JSON must be present.
Persist the structured result to .bkit/runtime/evals-{skill}-{ISO timestamp}.json with stdout/stderr tails (2000 chars each), parsed payload, and reason field.
Render a one-line summary in the chat:
- exit code
- parsed pass/fail counts (if available)
- path of the persisted result file

`list`

Read evals/config.json to enumerate skill classifications.
For each classification (workflow, capability, hybrid), list skills that have evals/{classification}/{skill}/eval.yaml.
Render a category-grouped table with skill name + a one-line note from the eval YAML (description field if present).

Security

Skill name regex prevents argument injection. Anything outside [a-z][a-z0-9-]{0,63} is rejected with reason: invalid_skill_name.
argv-array spawn (no shell). No template-string concatenation into command lines.
Result file path is composed from a hardcoded base + sanitized skill name + timestamp; no traversal possible.
Subprocess timeout enforced (default 30 s, hard cap 120 s) so a buggy eval cannot block the session indefinitely.

Module Dependencies

| Module | Function | Usage | |--------|----------|-------| | lib/evals/runner-wrapper.js | invokeEvals(skill, opts) | Validate + spawn + persist | | lib/evals/runner-wrapper.js | isValidSkillName(name) | Regex pre-check shared with list | | evals/runner.js | (subprocess) | Existing eval execution engine |

Result Schema

.bkit/runtime/evals-{skill}-{timestamp}.json:

{
  "skill": "gap-detector",
  "invokedAt": "<ISO 8601>",
  "exitCode": 0,
  "timedOut": false,
  "stdoutTail": "...",
  "stderrTail": "...",
  "parsed": { /* whatever runner.js prints as JSON, or null */ }
}

Examples

# Single eval
/bkit-evals run gap-detector

# Discovery
/bkit-evals list

/control trust — eval results contribute to trust score
/code-review — uses eval data when assessing skills
/bkit explore (FR-β1) — explore evals as a category

ARGUMENTS:

popup-studio-ai/bkit-evals

skills/bkit-evals/SKILL.md

Run skill evals via evals/runner.js — wrapper validates skill names, captures stdout/stderr, persists JSON results. Triggers: bkit evals, evals run, skill quality, eval runner, 스킬 평가, 評価実行, 评估运行, evaluación, évaluation.

519 stars

testing

Updated Apr 29, 2026

$ install --global

skillsauth

npx skillsauth add popup-studio-ai/bkit-claude-code bkit-evals

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 29, 2026, 9:49 AM14.0s1 file scanned

SKILL.md

name:: bkit-evals
classification:: capability
classification-reason:: Eval runner is a development-time quality tool, not a workflow phase
deprecation-risk:: none
effort:: low
description:: |
Triggers:: bkit evals, evals run, skill quality, eval runner, 스킬 평가, 評価実行, 评估运行, evaluación, évaluation.
argument-hint:: run <skill> | list
user-invocable:: true
imports:: []
next-skill:: null
pdca-phase:: null
task-template:: [Evals] {action}

bkit Evals — Skill Quality Evaluation Runner

v2.1.11 Sprint β FR-β2. Wraps evals/runner.js with input validation, result persistence, and structured reporting. Replaces the bare node evals/runner.js <skill> invocation that previously required users to remember argv structure and ignored timeout / sandbox concerns.

Arguments

If no argument is provided, render the same output as list.

Behavior

`run <skill>`

Validate skill against /^[a-z][a-z0-9-]{0,63}$/. Reject anything else (no shell metacharacters, no slashes, no spaces) — see Security below.
Spawn node evals/runner.js --skill <skill> via child_process.spawnSync (argv form, no shell). Default timeout 30 s, max 120 s. The --skill flag form is mandated by the runner CLI and locked by L3 contract test.
Capture stdout / stderr. Parse the trailing JSON block via balanced-brace fallback (string-aware).
Apply fail-closed defense: if parsed === null and stdout includes Usage:, return reason: 'argv_format_mismatch'; if parsed === null otherwise, return reason: 'parsed_null'. Exit code 0 alone NEVER implies success — the parsed JSON must be present.
Persist the structured result to .bkit/runtime/evals-{skill}-{ISO timestamp}.json with stdout/stderr tails (2000 chars each), parsed payload, and reason field.
Render a one-line summary in the chat:
- exit code
- parsed pass/fail counts (if available)
- path of the persisted result file

`list`

Read evals/config.json to enumerate skill classifications.
For each classification (workflow, capability, hybrid), list skills that have evals/{classification}/{skill}/eval.yaml.
Render a category-grouped table with skill name + a one-line note from the eval YAML (description field if present).

Security

Skill name regex prevents argument injection. Anything outside [a-z][a-z0-9-]{0,63} is rejected with reason: invalid_skill_name.
argv-array spawn (no shell). No template-string concatenation into command lines.
Result file path is composed from a hardcoded base + sanitized skill name + timestamp; no traversal possible.
Subprocess timeout enforced (default 30 s, hard cap 120 s) so a buggy eval cannot block the session indefinitely.

Module Dependencies

Result Schema

.bkit/runtime/evals-{skill}-{timestamp}.json:

{
  "skill": "gap-detector",
  "invokedAt": "<ISO 8601>",
  "exitCode": 0,
  "timedOut": false,
  "stdoutTail": "...",
  "stderrTail": "...",
  "parsed": { /* whatever runner.js prints as JSON, or null */ }
}

Examples

# Single eval
/bkit-evals run gap-detector

# Discovery
/bkit-evals list

/control trust — eval results contribute to trust score
/code-review — uses eval data when assessing skills
/bkit explore (FR-β1) — explore evals as a category

ARGUMENTS:

Related Skills

popup-studio-ai/sprint

testing

VerifiedTrustedCommunity

Sprint Management — generic sprint capability for ANY bkit user. 16 sub-actions: init, start, status, watch, phase, iterate, qa, report, archive, list, feature, pause, resume, fork, help, master-plan. Triggers: sprint, sprint start, sprint init, sprint status, sprint list, 스프린트, 스프린트 시작, 스프린트 상태, スプリント, スプリント開始, スプリント状態, 冲刺, 冲刺开始, 冲刺状态, sprint, iniciar sprint, estado sprint, sprint, demarrer sprint, statut sprint, Sprint, Sprint starten, Sprint Status, sprint, avviare sprint, stato sprint, master plan, multi-sprint plan, sprint master plan, 마스터 플랜, 멀티 스프린트 계획, 스프린트 마스터 플랜, マスタープラン, マルチスプリント計画, スプリントマスタープラン, 主计划, 多冲刺计划, 冲刺主计划, plan maestro, plan multi-sprint, plan maestro sprint, plan maître, plan multi-sprint, plan maître sprint, Masterplan, Multi-Sprint-Plan, Sprint-Masterplan, piano principale, piano multi-sprint, piano principale sprint.

548SKILL.mdUpdated May 13, 2026

popup-studio-ai/sprint

popup-studio-ai/cc-version-analysis

tools

VerifiedTrustedCommunity

CC CLI version upgrade impact analysis — research changes, analyze bkit impact, generate report. Triggers: cc-version-analysis, CC upgrade, version analysis, CC 버전 분석, 버전 영향.

545SKILL.mdUpdated Apr 18, 2026

popup-studio-ai/cc-version-analysis

popup-studio-ai/rollback

testing

VerifiedTrustedCommunity

Manage PDCA checkpoints and rollback — create, list, restore for safe recovery. Rollback events are recorded via lib/audit/audit-logger ACTION_TYPES.rollback_executed. For sprint-level recovery, individual feature rollbacks may be triggered from within sprint phases (sprint itself is forward-only — terminal state is `archived`, not rolled back; v2.1.13). Triggers: rollback, checkpoint, restore, undo, 롤백, 체크포인트, 복원.

539SKILL.mdUpdated Apr 18, 2026

popup-studio-ai/rollback

popup-studio-ai/qa-phase

testing

VerifiedTrustedCommunity

QA Phase execution — L1-L5 test planning, generation, execution, and reporting for a single feature. For sprint-level QA (7-Layer dataFlowIntegrity / S1 gate across multiple features) use /sprint qa <sprintId> which delegates to sprint-qa-flow agent (v2.1.13). Triggers: qa phase, QA test, qa run, QA 실행, QAフェーズ, QA阶段, fase QA, phase QA, QA-Phase, fase QA.

539SKILL.mdUpdated Apr 18, 2026

popup-studio-ai/qa-phase

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/popup-studio-ai/bkit-claude-code.git

# Copy into Claude Code skills folder (global)
cp -r bkit-claude-code/skills/bkit-evals ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

popup-studio-ai/bkit-claude-code

519 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT