plugins/reflect/skills/cost/SKILL.md
Report reflect drain spend over a time window — tokens split by cached (cache_read), uncached writes (cache_creation), and io (input+output), with a $ estimate, grouped by day / outcome / model / transcript. Reads the drainer's cost log and surfaces outlier runs and cache-reuse health (the 41.5M-token failure mode = low cache reuse + high cache writes). Use to answer "what is reflection costing me" for the last day / week.
npx skillsauth add stevengonsalvez/agents-in-a-box reflect:costInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Shows what the reflect background drainer is spending: token volume split into
cached (cheap reuse), uncached writes (expensive cache_creation), and
io (input+output), plus a ballpark $est, grouped by day, outcome, model,
or transcript. This is the observability the 2026-05-31 incident lacked — one
drain run burned 41.5M tokens / ~$713 for zero net-new learnings and nothing
surfaced it until a manual backfill.
sonnet, not
opus) after a deploy.Parse the window from the user's request and pass it as --since:
| User says | --since |
|-----------|-----------|
| "1 day", "today", "last 24h", (default) | 1d |
| "this week", "last 7 days" | 7d |
| "last hour" | 1h |
| "last month", "30 days" | 30d |
Default to 1d when unspecified.
Locate the cost reporter (prefer the running plugin, else the newest cached
version), then render. The script is reflect_cost.py, shipped in the reflect
plugin's scripts/.
WINDOW="1d" # ← substitute from the table above
# Resolve reflect_cost.py robustly across deploy layouts.
COST_PY=""
for cand in \
"${CLAUDE_PLUGIN_ROOT:-}/scripts/reflect_cost.py" \
$(ls -t "$HOME"/.claude/plugins/cache/agents-in-a-box/reflect/*/scripts/reflect_cost.py 2>/dev/null); do
if [ -n "$cand" ] && [ -f "$cand" ]; then COST_PY="$cand"; break; fi
done
if [ -z "$COST_PY" ]; then
echo "reflect_cost.py not found — install/update the reflect plugin (v4+):"
echo " claude plugin update reflect@agents-in-a-box"
exit 1
fi
# Headline: by outcome (where the spend goes), then model (cheap vs expensive),
# then the top transcripts (find the outlier).
python3 "$COST_PY" --since "$WINDOW" --by outcome
echo
python3 "$COST_PY" --since "$WINDOW" --by model
echo
python3 "$COST_PY" --since "$WINDOW" --by transcript --top 10
For a machine-readable view, add --json.
The drainer only began recording the full token envelope in v4. Older cost
events carry only outcome (no tokens/model), so the split shows 0.
Reconstruct the real numbers from the raw session logs with the backfill — it
scans ~/.claude/projects, finds the reflect runs, and sums their usage into a
separate drain-cost-backfill.jsonl (which reflect_cost.py reads alongside
the live log):
BACKFILL_PY="$(dirname "$COST_PY")/backfill_costs.py"
python3 "$BACKFILL_PY" --since "$WINDOW" \
--projects-dir "$HOME/.claude/projects" \
--state-dir "${REFLECT_STATE_DIR:-$HOME/.reflect}"
# then re-run the reflect_cost.py commands above
outcome runs tokens cache_rd cache_wr io $est
ok 48 120M 95M 18M 7.0M 220.40
partial_max_turns 3 40M 30M 8M 2.0M 70.10
cost_usd from claude -p where recorded, else an
order-of-magnitude estimate from token buckets × ballpark list prices. Not a bill.--outlier-tokens (default 5M). One
flagged transcript is usually where a spend spike lives.Then summarize for the user in one or two lines: total tokens, the cached vs uncached split, the model, the $est, and call out any outlier transcript.
claude plugin update reflect@agents-in-a-box and restart.? and 0 tokens — pre-v4 events only; run the
backfill above.REFLECT_STATE_DIR points elsewhere (ls "${REFLECT_STATE_DIR:-$HOME/.reflect}").development
Show fleet status — every claude session running on the host, merged across ainb + claude-peers broker + background jobs. Use when you need to enumerate sessions before composing an action, see which sessions have a peer registered (broker-routable) vs tmux-only, check the `summary` of each session, or pipe the list into jq for filtering. Default output: text table. Pass --format json for LLM consumption.
testing
Ordered multi-step prompts to fleet targets, ack-gated between steps via JSONL assistant-turn-end detection. Use for cycles like disconnect→reconnect→verify, or any flow where step N+1 requires step N to have completed first. The skill BLOCKS until each target's transcript shows the next assistant turn finishing OR per-step timeout fires (default 300s).
development
Center control panel — enumerate every claude session that is blocked waiting on something: a user answer (AskUserQuestion fired), an API error retry, an idle assistant turn-end with no follow-up, or an explicit WAITING: marker. Returns rich JSON with signal kind + context per session. Use this when you've stepped away from the fleet and want one place to see everything that wants your attention and answer it.
development
Workflow-backed Jarvis control panel. Runs the deterministic `hangar` workflow with verb=needs (discover → enrich → prioritize), renders the Jarvis HUD from its render-ready cards, fires AskUserQuestion per blocked session, and routes each answer back via tmux send-keys (broker fallback only). Requires the workflow gate (CLAUDE_CODE_WORKFLOWS=1). If the gate is off, fall back to the prompt-driven `/ainb-fleet:needs` skill.