config/agents/skills/waza-eval/SKILL.md
Use when creating a new skill or making a substantial change to an existing skill and you also need to design, update, or review Waza-based executable evaluations. This includes deciding whether Waza is warranted, mapping `evals.json` cases into Waza tasks, choosing fixtures and graders, selecting a valid model with `waza models --json`, and running a local-first `waza run` workflow. Do NOT use for installing the Waza CLI itself or for general skill-authoring advice that does not involve Waza; use `skill-creator` for skill design and this skill for the Waza execution layer. Trigger especially when the user mentions Waza, `waza run`, `waza models`, executable evals, compare, graders, fixtures, or wants to validate a skill change with model-backed evaluation.
npx skillsauth add kumewata/dotfiles waza-evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide for adding or updating Waza-based executable evals for skills.
waza-eval owns the execution layer:
evals.json maps to Waza tasksIt does not own:
Use skill-creator for the skill itself. Use waza-eval when the question becomes "how do we make this executable in Waza?"
waza CLI is already installed and available in PATHconfig/agents/skills/<skill>/config/agents/skills/<skill>/evals/evals.json exists, or should be created as the source of truthIf waza is not in PATH, stop treating this as a waza-eval task and resolve the environment first.
Treat config/agents/skills/<skill>/evals/evals.json as the source layer.
Treat Waza files as the execution layer.
Keep this contract:
skill_name matches the skill directory and Waza eval targetid becomes the stable task identitytype becomes task tags such as positive, boundary, negativeprompt stays aligned exactly unless there is a deliberate reason to divergeexpected_output informs grader designfiles carries supporting files when they are part of the source eval caseWaza-only fixtures are allowed when the prompt is intentionally short for trigger testing but not rich enough to produce a concrete output. Record that exception in the Waza README or pilot memo.
Waza is a good fit when:
evals.json was added or substantially revisedWaza is usually not the first move when:
evals.json cases yetconfig/agents/skills/<skill>/evals/evals.json.positive and one negative. Add boundary once the first pass works.waza models --json and choose an actual available model ID. Do not trust scaffold defaults blindly.waza run <skill> -v locally.Always discover the real model IDs first:
waza models --json
Then set the Waza eval config to an actual available ID such as gpt-5.2-codex.
Do not use broad labels like gpt-5 unless the local environment explicitly exposes that exact ID.
First pass:
Later passes:
Do not let cost or token-budget checks dominate before the task itself is known-good. If a grader keeps failing while the trigger behavior looks right, simplify the grader first.
Use Waza-only fixtures sparingly.
Good reasons:
Bad reasons:
Before calling the Waza update done, verify:
evals.json and Waza task files still alignwaza models --jsonevals.jsonInspect models:
waza models --json
Run one skill locally:
cd waza
waza run git -v
If the work finishes with a useful pattern, leave behind:
evals.jsoncontinue, stop, or needs redesigndevelopment
Generate a private monthly Codex usage and workflow insights report from local ~/.codex/sessions JSONL without exposing raw transcripts. Use when the user explicitly asks for $codex-insights, Codex insights, monthly AI-agent usage review, or a Codex replacement for Claude Code /insights.
tools
Use when the user wants Codex to ask Claude Code for a second opinion or review on code, docs, diffs, PR changes, or design notes without modifying files. This delegates bounded review-only analysis through the Claude Code CLI (`claude -p`). Do NOT use for implementation or file edits; keep this skill review-only. Trigger especially when the user says ask Claude, ask Claude Code, cc-delegate, Claude review, second opinion from Claude, compare Codex and Claude, or review this diff/document with Claude Code.
tools
Airflow DAG development skill for writing, reviewing, testing, and debugging Apache Airflow workflows. Use whenever the user mentions Airflow, DAGs, tasks, operators, sensors, schedules, retries, catchup, DAG import errors, DAG parse performance, or workflow orchestration in Python. Also use for Amazon MWAA / Managed Workflows for Apache Airflow work, including MWAA DAG deployment, requirements.txt, plugins.zip, aws-mwaa-docker-images, S3 DAG folders, CloudWatch logs, and MWAA-specific dependency or IAM issues.
development
Use when the user asks for help drafting a GitHub PR description, a PR review comment, or a Slack post in their own tone (i.e., their personal writing voice). The skill detects the context (formal for PR / review, casual for Slack) and target_type (pr_description, pr_review, slack), drafts the body with an explicit reflection step that avoids verbose, mechanical phrasing, and stages the draft to `~/.local/state/tone/drafts/` via `tone-stage-draft.sh`. The user later runs `/tone-capture <url>` after posting, which pairs the staged draft with the final body to build a corpus for future tone tuning. Trigger especially when the user mentions PR description, PR review comment, Slack post, または「文を書いて」「文面を作って」「自分らしく」「トーン」「tone」.