.agents/skills/repo-task-proof-loop/SKILL.md
Repo-local workflow skill for large coding tasks. Initializes .agent/tasks/TASK_ID artifacts, installs project-scoped Codex, Claude Code, and OpenCode subagents, updates AGENTS.md plus the repo's Claude guide file with the workflow, and runs a spec-freeze → build → evidence → verify → fix loop with fresh-session verification.
npx skillsauth add noartem/kawa repo-task-proof-loopInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the user wants a repeatable, auditable implementation workflow for a non-trivial coding task, especially a feature, refactor, migration, or bug fix that should leave repo-local proof in .agent/tasks/<TASK_ID>/.
All task artifacts created by this workflow must stay inside the repository.
When the examples below mention scripts/task_loop.py, that path is relative to this skill root. Run it while your shell working directory is inside the target repository.
.agent/tasks/<TASK_ID>/.codex/agents/, .claude/agents/, and .opencode/agents/AGENTS.md Codex baseline plus the repo's Claude guide file (CLAUDE.md or .claude/CLAUDE.md) with a managed block that explains the workflowPASSSee:
references/REFERENCE.mdreferences/COMMANDS.mdreferences/SUBAGENTS.mdreferences/SCHEMAS.mdTreat the following words as commands when the user invokes this skill:
init <TASK_ID>: create .agent/tasks/<TASK_ID>/, install or refresh subagent templates, and update AGENTS.md plus the repo's Claude guide filefreeze <TASK_ID>: create or refine spec.md from the user task, task file, and repo guidancebuild <TASK_ID>: implement the task against the frozen specevidence <TASK_ID>: create or refresh evidence.md, evidence.json, and raw artifacts without changing production codeverify <TASK_ID>: run a fresh verifier pass and write verdict.json, plus problems.md when neededfix <TASK_ID>: apply the smallest safe fix set from problems.md, then refresh the evidence bundlerun <TASK_ID>: execute the full loop from spec freeze through verificationstatus <TASK_ID>: summarize current artifact statusIf the user does not supply a command, infer the next step from repo state:
init first. If the user clearly wants initialization only, stop there. Otherwise, after init succeeds and .agent/tasks/<TASK_ID>/spec.md exists, continue by re-evaluating repo state in the same turn. Do not overlap init with freeze, build, evidence, verify, fix, validate, status, or subagent work.spec.md is missing or placeholder-only, do freezebuildevidenceverifyPASS, do fixRun the bundled initializer from the repository root or current working directory inside the repo:
scripts/task_loop.py init --task-id <TASK_ID>
Optional task seeding:
scripts/task_loop.py init --task-id <TASK_ID> --task-file path/to/task.md
scripts/task_loop.py init --task-id <TASK_ID> --task-text "User task text"
The initializer will:
.agent/tasks/<TASK_ID>/raw/AGENTS.md and the repo's Claude guide fileFor Codex, the initializer keeps its managed workflow block in the repo-root AGENTS.md. Codex also supports AGENTS.override.md and configured fallback guide filenames; nested files closer to the code still take precedence, and this skill intentionally does not overwrite them.
If init creates or rewrites AGENTS.md during a running Codex session, start a new Codex session before relying on the updated instructions. Codex snapshots project-doc guidance at session start.
For Claude Code, the initializer keeps its managed workflow block in the repo-root CLAUDE.md. Claude Code also supports .claude/CLAUDE.md, .claude/rules/*.md, and CLAUDE.local.md, but this skill treats root CLAUDE.md as the primary project guide because Claude surfaces it directly.
In Claude Code, if init just wrote or refreshed .claude/agents/* during the current session, do not assume those updated agents are already available mid-session.
For OpenCode, the initializer installs project-scoped workflow agents into .opencode/agents/. When you want a product-specific default, use --install-subagents opencode --guides agents.
Treat init as a serial prerequisite. Never overlap it with freeze, build, evidence, verify, fix, validate, status, or child-agent spawning.
For large tasks, keep the user-facing request simple. In Codex, continue serially unless the user explicitly asks for delegation or parallel agent work; after that authorization, the skill can choose the internal child setup automatically when the current product surface supports delegation and the task shape warrants it.
init <TASK_ID> if needed. Wait for it to finish, then confirm .agent/tasks/<TASK_ID>/spec.md and the repo-local task structure exist before continuing.init completes, spawn exactly one spec-freezer subagent and wait for itPASS, spawn exactly one fixer subagentPASS or the user stops the loopUse this only after the user has explicitly authorized Codex delegation and the task is broad enough to benefit from bounded parallel work. Use the simpler serial sequence above for narrow tasks.
Good fits:
Codex pattern:
init stays serial.explorer children in parallel. Give each one a single question, subsystem, or path scope. Wait for them, then freeze the spec.task-builder child as the integration owner.worker children in parallel. Each worker must have explicit file or module ownership and must not write evidence.md, evidence.json, verdict.json, or problems.md.send_input or the equivalent follow-up surface to keep the integration builder alive for evidence packing. The builder remains the single owner of the evidence bundle.explorer children to rerun disjoint checks or inspect separate proof gaps in parallel. Those children may report commands, outputs, and findings, but they do not write verdict.json.init. Avoid surfacing delegation internals unless they materially affect the work.init, evidence ownership, and every verifier pass serialized either way.explorer is the first choice for read-only repo discovery and proof probes. Built-in worker is appropriate for bounded disjoint implementation or check reruns when you can assign explicit ownership./agent in Codex CLI or any equivalent child-thread inventory surface available in the current Codex product surface.update_plan is optional session guidance only. It is useful for live progress display, but it is not the source of truth for this workflow.init. The user should not need to request a specific Claude subagent or delegation mode separately..claude/agents/, with descriptions written as proactive trigger conditions for the matching proof-loop phase. Claude's main session routes by the task request, subagent descriptions, and current context, so keep each phase prompt clear in natural language. Reuse the same builder child for the evidence step by default. Only run a fresh builder in evidence-only mode if the original builder session is unavailable or you intentionally discarded it. If init just refreshed .claude/agents/* during the current Claude session, fall back to the main thread or already-visible agents instead of assuming the refreshed ones are available immediately..agent/tasks/<TASK_ID>/, especially spec.md, evidence.md, evidence.json, verdict.json, and problems.md..opencode/agents/. If the platform cannot continue the same builder child session for the evidence step, run a new builder subagent in evidence-only mode.Use the exact role prompts from references/COMMANDS.md.
spec.md must contain at least:
AC1, AC2, ...It may also include:
Do not edit production code during spec freeze.
evidence.md and evidence.json must judge each acceptance criterion independently with one of:
PASSFAILUNKNOWNEvidence packing may run missing checks, but it must not keep changing production code.
Every PASS must cite concrete proof such as:
raw/Do not claim overall PASS in the evidence bundle unless every acceptance criterion is PASS.
The verifier must be a fresh session or fresh subagent. In Codex, do not satisfy this requirement by resuming a prior verifier child.
The verifier must judge the current repository state and current rerun results, not the builder narrative.
The verifier writes:
.agent/tasks/<TASK_ID>/verdict.json.agent/tasks/<TASK_ID>/problems.md only when overall verdict is not PASSproblems.md must include, for each non-PASS criterion:
The verifier must not modify production code or backfill the evidence bundle.
The fixer reads only:
spec.mdverdict.jsonproblems.mdThe fixer must:
evidence.md, evidence.json, and raw artifactsBefore claiming the workflow is correctly initialized or the artifact set is complete, run:
scripts/task_loop.py validate --task-id <TASK_ID>
Run validate only after init has fully finished. If it reports initialization in progress, wait and rerun it instead of treating that result as stable task failure.
For a quick summary:
scripts/task_loop.py status --task-id <TASK_ID>
Run status only after init has fully finished when you need stable task state. If it reports init_in_progress: true, treat that as a retry-later condition.
.agent/tasks/<TASK_ID>/ inside the repo.agent/tasks/<TASK_ID>/PASSAGENTS.md and the repo's chosen Claude guide filedevelopment
Use this skill any time a spreadsheet file is the primary input or output. This means any task where the user wants to: open, read, edit, or fix an existing .xlsx, .xlsm, .csv, or .tsv file (e.g., adding columns, computing formulas, formatting, charting, cleaning messy data); create a new spreadsheet from scratch or from other data sources; or convert between tabular file formats. Trigger especially when the user references a spreadsheet file by name or path — even casually (like "the xlsx in my downloads") — and wants something done to it or produced from it. Also trigger for cleaning or restructuring messy tabular data files (malformed rows, misplaced headers, junk data) into proper spreadsheets. The deliverable must be a spreadsheet file. Do NOT trigger when the primary deliverable is a Word document, HTML report, standalone Python script, database pipeline, or Google Sheets API integration, even if tabular data is involved.
tools
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.
development
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
tools
Suite of tools for creating elaborate, multi-component claude.ai HTML artifacts using modern frontend web technologies (React, Tailwind CSS, shadcn/ui). Use for complex artifacts requiring state management, routing, or shadcn/ui components - not for simple single-file HTML/JSX artifacts.