toolkit/packages/skills/implement/SKILL.md
Execute implementation plan step-by-step
npx skillsauth add stevengonsalvez/agents-in-a-box implementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are tasked with implementing an approved technical plan from plans/. These plans contain phases with specific changes and success criteria.
Before implementing, recall prior learnings from the global knowledge base so we don't re-learn or re-decide something already captured:
uv run "{{HOME_TOOL_DIR}}/skills/recall/scripts/recall.py" \
"<QUERY>" \
--limit 5 --format markdown
Query construction for /implement: the plan's current phase name + touched module names (e.g. "auth middleware session token storage"). Skip if the calling /plan already ran recall and surfaced findings..
What to do with results:
When given a plan path:
After reading the plan, check for wave metadata:
## Phase sections for <!-- wave: commentsExecute phases grouped by wave number:
Single-phase wave: Execute inline (same as sequential -- avoids spawn overhead).
Multi-phase wave: Spawn parallel Task agents:
For each phase in this wave, spawn:
Task(
prompt: "Implement Phase {N}: {title}
Plan: {plan_path}
Phase: {N} only
Read the plan and implement ONLY Phase {N}.
Follow Changes Required exactly.
Run Success Criteria checks.
Update plan checkboxes when done.
Files you own (ONLY modify these): {files from wave comment}
Report: what was implemented, tests passing, any deviations.",
subagent_type: "general-purpose"
)
Wait for ALL agents in the wave before starting next wave.
On failure: Stop and report:
Wave {N} execution:
Succeeded: Phase {X}, Phase {Y}
Failed: Phase {Z} -- {reason}
Options:
1. Retry failed phase only
2. Continue to next wave (skip failed)
3. Fall back to sequential for remaining phases
After each wave: Verify no file ownership conflicts via git diff --name-only.
If no plan path provided, ask for one.
Plans are carefully designed, but reality can be messy. Your job is to:
When things don't match the plan exactly, think about why and communicate clearly. The plan is your guide, but your judgment matters too.
If you encounter a mismatch:
Issue in Phase [N]:
Expected: [what the plan says]
Found: [actual situation]
Why this matters: [explanation]
How should I proceed?
For each phase in the plan:
Understand the Phase:
Implement Changes:
Verify as You Go:
Update Progress:
Update project state (if .planning/ exists):
.planning/STATE.md with progress.planning/phases/{phase}/SUMMARY.md if it exists.planning/REQUIREMENTS.mdAfter implementing a phase:
Don't let verification interrupt your flow - batch it at natural stopping points.
When you encounter a checkpoint marker in the plan:
[CHECKPOINT:human-verify]:
[CHECKPOINT:decision]:
[CHECKPOINT:human-action]:
Automation-first rule: Before creating any checkpoint, verify Claude cannot perform the action itself via CLI, API, or file operations. Checkpoints verify after automation, never replace it.
Follow the plan's testing strategy:
First, verify your understanding:
Debug systematically:
If stuck, document and ask:
If the codebase has evolved since the plan was written:
If the plan has existing checkmarks:
Maintain Forward Momentum:
Communicate Progress:
Quality Over Speed:
Document Deviations:
Before considering implementation complete:
Remember: You're implementing a solution, not just checking boxes. Keep the end goal in mind and maintain forward momentum.
documentation
Report reflect drain spend over a time window — tokens split by cached (cache_read), uncached writes (cache_creation), and io (input+output), with a $ estimate, grouped by day / outcome / model / transcript. Reads the drainer's cost log and surfaces outlier runs and cache-reuse health (the 41.5M-token failure mode = low cache reuse + high cache writes). Use to answer "what is reflection costing me" for the last day / week.
development
Show fleet status — every claude session running on the host, merged across ainb + claude-peers broker + background jobs. Use when you need to enumerate sessions before composing an action, see which sessions have a peer registered (broker-routable) vs tmux-only, check the `summary` of each session, or pipe the list into jq for filtering. Default output: text table. Pass --format json for LLM consumption.
testing
Ordered multi-step prompts to fleet targets, ack-gated between steps via JSONL assistant-turn-end detection. Use for cycles like disconnect→reconnect→verify, or any flow where step N+1 requires step N to have completed first. The skill BLOCKS until each target's transcript shows the next assistant turn finishing OR per-step timeout fires (default 300s).
development
Center control panel — enumerate every claude session that is blocked waiting on something: a user answer (AskUserQuestion fired), an API error retry, an idle assistant turn-end with no follow-up, or an explicit WAITING: marker. Returns rich JSON with signal kind + context per session. Use this when you've stepped away from the fleet and want one place to see everything that wants your attention and answer it.