Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

kumewata/waza-eval

Name: waza-eval
Author: kumewata

config/agents/skills/waza-eval/SKILL.md

npx skillsauth add kumewata/dotfiles waza-eval

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Waza Eval

Guide for adding or updating Waza-based executable evals for skills.

Scope

waza-eval owns the execution layer:

whether Waza should be added for a skill change
how evals.json maps to Waza tasks
when to add Waza-only fixtures
how to choose simple graders first
how to run the eval locally

It does not own:

skill design in general
description tuning in general
Waza CLI installation

Use skill-creator for the skill itself. Use waza-eval when the question becomes "how do we make this executable in Waza?"

Preconditions

waza CLI is already installed and available in PATH
the skill has a directory under config/agents/skills/<skill>/
config/agents/skills/<skill>/evals/evals.json exists, or should be created as the source of truth

If waza is not in PATH, stop treating this as a waza-eval task and resolve the environment first.

Source Of Truth

Treat config/agents/skills/<skill>/evals/evals.json as the source layer.

Treat Waza files as the execution layer.

Keep this contract:

skill_name matches the skill directory and Waza eval target
id becomes the stable task identity
type becomes task tags such as positive, boundary, negative
prompt stays aligned exactly unless there is a deliberate reason to diverge
expected_output informs grader design
files carries supporting files when they are part of the source eval case

Waza-only fixtures are allowed when the prompt is intentionally short for trigger testing but not rich enough to produce a concrete output. Record that exception in the Waza README or pilot memo.

When To Add Waza

Waza is a good fit when:

a new skill is being added and you want model-backed validation
a skill description or trigger behavior changed materially
evals.json was added or substantially revised
you want to confirm trigger, anti-trigger, or boundary behavior with a real model
you are considering later CI adoption and want a local-first pilot first

Waza is usually not the first move when:

the change is a tiny typo or formatting-only edit
the skill has no meaningful evals.json cases yet
you are still deciding the basic skill shape and do not have stable cases

Workflow

Read config/agents/skills/<skill>/evals/evals.json.
Decide which cases should become Waza tasks first. Prefer one positive and one negative. Add boundary once the first pass works.
Check whether the prompt alone is enough to produce a meaningful output.
If not, add the minimum Waza-only fixture needed to make the task executable. Do not rewrite the source prompt just to make execution easier.
Start with text or regex-style graders. Keep them narrow and legible.
Run waza models --json and choose an actual available model ID. Do not trust scaffold defaults blindly.
Run waza run <skill> -v locally.
Inspect failures in this order:
- wrong model or unavailable model
- bad fixture assumptions
- brittle grader
- genuine trigger mismatch
Only after stable local runs should you discuss compare, behavior graders, or CI integration.

Model Selection

Always discover the real model IDs first:

waza models --json

Then set the Waza eval config to an actual available ID such as gpt-5.2-codex.

Do not use broad labels like gpt-5 unless the local environment explicitly exposes that exact ID.

Grader Strategy

First pass:

text contains / not-contains checks
regex checks for structured outputs
small, readable assertions tied to the eval intent

Later passes:

token budget
behavior graders
compare flows

Do not let cost or token-budget checks dominate before the task itself is known-good. If a grader keeps failing while the trigger behavior looks right, simplify the grader first.

Fixture Strategy

Use Waza-only fixtures sparingly.

Good reasons:

commit-message prompts that need a change summary or diff
file-generation prompts that need a tiny sample input
anti-trigger checks that need contextual files to avoid ambiguity

Bad reasons:

hiding that the source prompt is unclear
compensating for a weak skill description
adding large context blobs "just in case"

Verification Checklist

Before calling the Waza update done, verify:

evals.json and Waza task files still align
the chosen model exists in waza models --json
at least one positive and one negative case run locally
any Waza-only fixture has a short rationale written down
the graders are readable enough that another person can understand why a task passed or failed

Anti-Patterns

updating Waza tasks without updating evals.json
guessing model names
forcing prompt-only execution when a tiny fixture is clearly needed
adding token-budget or compare logic too early
discussing CI before a local run is stable
using this skill for CLI installation work

Example Commands

Inspect models:

waza models --json

Run one skill locally:

cd waza
waza run git -v

Handoff Notes

If the work finishes with a useful pattern, leave behind:

updated evals.json
Waza task and fixture files
a short note explaining any Waza-only fixture
the chosen model ID
whether the next step is continue, stop, or needs redesign

kumewata/waza-eval

config/agents/skills/waza-eval/SKILL.md

Use when creating a new skill or making a substantial change to an existing skill and you also need to design, update, or review Waza-based executable evaluations. This includes deciding whether Waza is warranted, mapping `evals.json` cases into Waza tasks, choosing fixtures and graders, selecting a valid model with `waza models --json`, and running a local-first `waza run` workflow. Do NOT use for installing the Waza CLI itself or for general skill-authoring advice that does not involve Waza; use `skill-creator` for skill design and this skill for the Waza execution layer. Trigger especially when the user mentions Waza, `waza run`, `waza models`, executable evals, compare, graders, fixtures, or wants to validate a skill change with model-backed evaluation.

tools

Updated Jun 3, 2026

$ install --global

skillsauth

npx skillsauth add kumewata/dotfiles waza-eval

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 3, 2026, 4:07 AM22.1s2 files scanned

SKILL.md

name:: waza-eval
description:: |

Waza Eval

Guide for adding or updating Waza-based executable evals for skills.

Scope

waza-eval owns the execution layer:

whether Waza should be added for a skill change
how evals.json maps to Waza tasks
when to add Waza-only fixtures
how to choose simple graders first
how to run the eval locally

It does not own:

skill design in general
description tuning in general
Waza CLI installation

Use skill-creator for the skill itself. Use waza-eval when the question becomes "how do we make this executable in Waza?"

Preconditions

waza CLI is already installed and available in PATH
the skill has a directory under config/agents/skills/<skill>/
config/agents/skills/<skill>/evals/evals.json exists, or should be created as the source of truth

If waza is not in PATH, stop treating this as a waza-eval task and resolve the environment first.

Source Of Truth

Treat config/agents/skills/<skill>/evals/evals.json as the source layer.

Treat Waza files as the execution layer.

Keep this contract:

skill_name matches the skill directory and Waza eval target
id becomes the stable task identity
type becomes task tags such as positive, boundary, negative
prompt stays aligned exactly unless there is a deliberate reason to diverge
expected_output informs grader design
files carries supporting files when they are part of the source eval case

Waza-only fixtures are allowed when the prompt is intentionally short for trigger testing but not rich enough to produce a concrete output. Record that exception in the Waza README or pilot memo.

When To Add Waza

Waza is a good fit when:

a new skill is being added and you want model-backed validation
a skill description or trigger behavior changed materially
evals.json was added or substantially revised
you want to confirm trigger, anti-trigger, or boundary behavior with a real model
you are considering later CI adoption and want a local-first pilot first

Waza is usually not the first move when:

the change is a tiny typo or formatting-only edit
the skill has no meaningful evals.json cases yet
you are still deciding the basic skill shape and do not have stable cases

Workflow

Read config/agents/skills/<skill>/evals/evals.json.
Decide which cases should become Waza tasks first. Prefer one positive and one negative. Add boundary once the first pass works.
Check whether the prompt alone is enough to produce a meaningful output.
If not, add the minimum Waza-only fixture needed to make the task executable. Do not rewrite the source prompt just to make execution easier.
Start with text or regex-style graders. Keep them narrow and legible.
Run waza models --json and choose an actual available model ID. Do not trust scaffold defaults blindly.
Run waza run <skill> -v locally.
Inspect failures in this order:
- wrong model or unavailable model
- bad fixture assumptions
- brittle grader
- genuine trigger mismatch
Only after stable local runs should you discuss compare, behavior graders, or CI integration.

Model Selection

Always discover the real model IDs first:

waza models --json

Then set the Waza eval config to an actual available ID such as gpt-5.2-codex.

Do not use broad labels like gpt-5 unless the local environment explicitly exposes that exact ID.

Grader Strategy

First pass:

text contains / not-contains checks
regex checks for structured outputs
small, readable assertions tied to the eval intent

Later passes:

token budget
behavior graders
compare flows

Do not let cost or token-budget checks dominate before the task itself is known-good. If a grader keeps failing while the trigger behavior looks right, simplify the grader first.

Fixture Strategy

Use Waza-only fixtures sparingly.

Good reasons:

commit-message prompts that need a change summary or diff
file-generation prompts that need a tiny sample input
anti-trigger checks that need contextual files to avoid ambiguity

Bad reasons:

hiding that the source prompt is unclear
compensating for a weak skill description
adding large context blobs "just in case"

Verification Checklist

Before calling the Waza update done, verify:

evals.json and Waza task files still align
the chosen model exists in waza models --json
at least one positive and one negative case run locally
any Waza-only fixture has a short rationale written down
the graders are readable enough that another person can understand why a task passed or failed

Anti-Patterns

updating Waza tasks without updating evals.json
guessing model names
forcing prompt-only execution when a tiny fixture is clearly needed
adding token-budget or compare logic too early
discussing CI before a local run is stable
using this skill for CLI installation work

Example Commands

Inspect models:

waza models --json

Run one skill locally:

cd waza
waza run git -v

Handoff Notes

If the work finishes with a useful pattern, leave behind:

updated evals.json
Waza task and fixture files
a short note explaining any Waza-only fixture
the chosen model ID
whether the next step is continue, stop, or needs redesign

Related Skills

kumewata/codex-insights

development

VerifiedTrustedCommunity

Generate a private monthly Codex usage and workflow insights report from local ~/.codex/sessions JSONL without exposing raw transcripts. Use when the user explicitly asks for $codex-insights, Codex insights, monthly AI-agent usage review, or a Codex replacement for Claude Code /insights.

SKILL.mdUpdated Jun 9, 2026

kumewata/codex-insights

kumewata/cc-delegate

tools

VerifiedTrustedCommunity

Use when the user wants Codex to ask Claude Code for a second opinion or review on code, docs, diffs, PR changes, or design notes without modifying files. This delegates bounded review-only analysis through the Claude Code CLI (`claude -p`). Do NOT use for implementation or file edits; keep this skill review-only. Trigger especially when the user says ask Claude, ask Claude Code, cc-delegate, Claude review, second opinion from Claude, compare Codex and Claude, or review this diff/document with Claude Code.

SKILL.mdUpdated May 29, 2026

kumewata/airflow

tools

VerifiedTrustedCommunity

Airflow DAG development skill for writing, reviewing, testing, and debugging Apache Airflow workflows. Use whenever the user mentions Airflow, DAGs, tasks, operators, sensors, schedules, retries, catchup, DAG import errors, DAG parse performance, or workflow orchestration in Python. Also use for Amazon MWAA / Managed Workflows for Apache Airflow work, including MWAA DAG deployment, requirements.txt, plugins.zip, aws-mwaa-docker-images, S3 DAG folders, CloudWatch logs, and MWAA-specific dependency or IAM issues.

SKILL.mdUpdated May 17, 2026

kumewata/tone

development

VerifiedTrustedCommunity

Use when the user asks for help drafting a GitHub PR description, a PR review comment, or a Slack post in their own tone (i.e., their personal writing voice). The skill detects the context (formal for PR / review, casual for Slack) and target_type (pr_description, pr_review, slack), drafts the body with an explicit reflection step that avoids verbose, mechanical phrasing, and stages the draft to `~/.local/state/tone/drafts/` via `tone-stage-draft.sh`. The user later runs `/tone-capture <url>` after posting, which pairs the staged draft with the final body to build a corpus for future tone tuning. Trigger especially when the user mentions PR description, PR review comment, Slack post, または「文を書いて」「文面を作って」「自分らしく」「トーン」「tone」.

SKILL.mdUpdated May 1, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/kumewata/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/config/agents/skills/waza-eval ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

kumewata/dotfiles

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT