Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

stevengonsalvez/reflect:cost

Name: reflect:cost
Author: stevengonsalvez

plugins/reflect/skills/cost/SKILL.md

npx skillsauth add stevengonsalvez/agents-in-a-box reflect:cost

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

/reflect:cost — Drain spend report

Shows what the reflect background drainer is spending: token volume split into cached (cheap reuse), uncached writes (expensive cache_creation), and io (input+output), plus a ballpark $est, grouped by day, outcome, model, or transcript. This is the observability the 2026-05-31 incident lacked — one drain run burned 41.5M tokens / ~$713 for zero net-new learnings and nothing surfaced it until a manual backfill.

When to Use

"What did reflection cost me today / this week?"
Spot an outlier run (a single transcript that blew up the spend).
Check cache reuse: a healthy drain is cache_read-heavy; a low cache-reuse % with high cache_creation is the costly failure mode.
Confirm the drain is running on the cheap model (should be sonnet, not opus) after a deploy.

Window argument

Parse the window from the user's request and pass it as --since:

| User says | --since | |-----------|-----------| | "1 day", "today", "last 24h", (default) | 1d | | "this week", "last 7 days" | 7d | | "last hour" | 1h | | "last month", "30 days" | 30d |

Default to 1d when unspecified.

Run it

Locate the cost reporter (prefer the running plugin, else the newest cached version), then render. The script is reflect_cost.py, shipped in the reflect plugin's scripts/.

WINDOW="1d"   # ← substitute from the table above

# Resolve reflect_cost.py robustly across deploy layouts.
COST_PY=""
for cand in \
  "${CLAUDE_PLUGIN_ROOT:-}/scripts/reflect_cost.py" \
  $(ls -t "$HOME"/.claude/plugins/cache/agents-in-a-box/reflect/*/scripts/reflect_cost.py 2>/dev/null); do
  if [ -n "$cand" ] && [ -f "$cand" ]; then COST_PY="$cand"; break; fi
done
if [ -z "$COST_PY" ]; then
  echo "reflect_cost.py not found — install/update the reflect plugin (v4+):"
  echo "  claude plugin update reflect@agents-in-a-box"
  exit 1
fi

# Headline: by outcome (where the spend goes), then model (cheap vs expensive),
# then the top transcripts (find the outlier).
python3 "$COST_PY" --since "$WINDOW" --by outcome
echo
python3 "$COST_PY" --since "$WINDOW" --by model
echo
python3 "$COST_PY" --since "$WINDOW" --by transcript --top 10

For a machine-readable view, add --json.

If tokens show 0 (historical / pre-v4 events)

The drainer only began recording the full token envelope in v4. Older cost events carry only outcome (no tokens/model), so the split shows 0. Reconstruct the real numbers from the raw session logs with the backfill — it scans ~/.claude/projects, finds the reflect runs, and sums their usage into a separate drain-cost-backfill.jsonl (which reflect_cost.py reads alongside the live log):

BACKFILL_PY="$(dirname "$COST_PY")/backfill_costs.py"
python3 "$BACKFILL_PY" --since "$WINDOW" \
  --projects-dir "$HOME/.claude/projects" \
  --state-dir "${REFLECT_STATE_DIR:-$HOME/.reflect}"
# then re-run the reflect_cost.py commands above

Reading the output

outcome                 runs    tokens  cache_rd  cache_wr      io     $est
ok                        48      120M       95M       18M     7.0M   220.40
partial_max_turns          3       40M       30M        8M     2.0M    70.10

cache_rd (cache_read) — cheap reuse, should dominate.
cache_wr (cache_creation) — expensive writes; high here means caching is not amortizing (re-paying to rebuild context). The incident was cache_wr-heavy.
io — input + output tokens.
$est — authoritative cost_usd from claude -p where recorded, else an order-of-magnitude estimate from token buckets × ballpark list prices. Not a bill.
⚠ outlier — any single run above --outlier-tokens (default 5M). One flagged transcript is usually where a spend spike lives.
The cache-reuse % line at the bottom is the fastest health read: low % + high cache_wr = the 41.5M failure mode.

Then summarize for the user in one or two lines: total tokens, the cached vs uncached split, the model, the $est, and call out any outlier transcript.

Troubleshooting

"reflect_cost.py not found" — the installed plugin predates v4. Run claude plugin update reflect@agents-in-a-box and restart.
All rows show model ? and 0 tokens — pre-v4 events only; run the backfill above.
No events at all — the drain hasn't run in the window, or REFLECT_STATE_DIR points elsewhere (ls "${REFLECT_STATE_DIR:-$HOME/.reflect}").

stevengonsalvez/reflect:cost

plugins/reflect/skills/cost/SKILL.md

Report reflect drain spend over a time window — tokens split by cached (cache_read), uncached writes (cache_creation), and io (input+output), with a $ estimate, grouped by day / outcome / model / transcript. Reads the drainer's cost log and surfaces outlier runs and cache-reuse health (the 41.5M-token failure mode = low cache reuse + high cache writes). Use to answer "what is reflection costing me" for the last day / week.

12 stars

documentation

Updated Jun 2, 2026

$ install --global

skillsauth

npx skillsauth add stevengonsalvez/agents-in-a-box reflect:cost

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 2, 2026, 3:35 AM133.2s1 file scanned

SKILL.md

name:: reflect:cost
description:: |
version:: 4.0.0
user-invocable:: true
- reflect:: cost

/reflect:cost — Drain spend report

When to Use

"What did reflection cost me today / this week?"
Spot an outlier run (a single transcript that blew up the spend).
Check cache reuse: a healthy drain is cache_read-heavy; a low cache-reuse % with high cache_creation is the costly failure mode.
Confirm the drain is running on the cheap model (should be sonnet, not opus) after a deploy.

Window argument

Parse the window from the user's request and pass it as --since:

| User says | --since | |-----------|-----------| | "1 day", "today", "last 24h", (default) | 1d | | "this week", "last 7 days" | 7d | | "last hour" | 1h | | "last month", "30 days" | 30d |

Default to 1d when unspecified.

Run it

Locate the cost reporter (prefer the running plugin, else the newest cached version), then render. The script is reflect_cost.py, shipped in the reflect plugin's scripts/.

WINDOW="1d"   # ← substitute from the table above

# Resolve reflect_cost.py robustly across deploy layouts.
COST_PY=""
for cand in \
  "${CLAUDE_PLUGIN_ROOT:-}/scripts/reflect_cost.py" \
  $(ls -t "$HOME"/.claude/plugins/cache/agents-in-a-box/reflect/*/scripts/reflect_cost.py 2>/dev/null); do
  if [ -n "$cand" ] && [ -f "$cand" ]; then COST_PY="$cand"; break; fi
done
if [ -z "$COST_PY" ]; then
  echo "reflect_cost.py not found — install/update the reflect plugin (v4+):"
  echo "  claude plugin update reflect@agents-in-a-box"
  exit 1
fi

# Headline: by outcome (where the spend goes), then model (cheap vs expensive),
# then the top transcripts (find the outlier).
python3 "$COST_PY" --since "$WINDOW" --by outcome
echo
python3 "$COST_PY" --since "$WINDOW" --by model
echo
python3 "$COST_PY" --since "$WINDOW" --by transcript --top 10

For a machine-readable view, add --json.

If tokens show 0 (historical / pre-v4 events)

BACKFILL_PY="$(dirname "$COST_PY")/backfill_costs.py"
python3 "$BACKFILL_PY" --since "$WINDOW" \
  --projects-dir "$HOME/.claude/projects" \
  --state-dir "${REFLECT_STATE_DIR:-$HOME/.reflect}"
# then re-run the reflect_cost.py commands above

Reading the output

outcome                 runs    tokens  cache_rd  cache_wr      io     $est
ok                        48      120M       95M       18M     7.0M   220.40
partial_max_turns          3       40M       30M        8M     2.0M    70.10

cache_rd (cache_read) — cheap reuse, should dominate.
cache_wr (cache_creation) — expensive writes; high here means caching is not amortizing (re-paying to rebuild context). The incident was cache_wr-heavy.
io — input + output tokens.
$est — authoritative cost_usd from claude -p where recorded, else an order-of-magnitude estimate from token buckets × ballpark list prices. Not a bill.
⚠ outlier — any single run above --outlier-tokens (default 5M). One flagged transcript is usually where a spend spike lives.
The cache-reuse % line at the bottom is the fastest health read: low % + high cache_wr = the 41.5M failure mode.

Then summarize for the user in one or two lines: total tokens, the cached vs uncached split, the model, the $est, and call out any outlier transcript.

Troubleshooting

"reflect_cost.py not found" — the installed plugin predates v4. Run claude plugin update reflect@agents-in-a-box and restart.
All rows show model ? and 0 tokens — pre-v4 events only; run the backfill above.
No events at all — the drain hasn't run in the window, or REFLECT_STATE_DIR points elsewhere (ls "${REFLECT_STATE_DIR:-$HOME/.reflect}").

Related Skills

stevengonsalvez/ainb-fleet:standup

development

VerifiedTrustedCommunity

Show fleet status — every claude session running on the host, merged across ainb + claude-peers broker + background jobs. Use when you need to enumerate sessions before composing an action, see which sessions have a peer registered (broker-routable) vs tmux-only, check the `summary` of each session, or pipe the list into jq for filtering. Default output: text table. Pass --format json for LLM consumption.

10SKILL.mdUpdated May 31, 2026

stevengonsalvez/ainb-fleet:standup

stevengonsalvez/ainb-fleet:sequence

testing

VerifiedTrustedCommunity

Ordered multi-step prompts to fleet targets, ack-gated between steps via JSONL assistant-turn-end detection. Use for cycles like disconnect→reconnect→verify, or any flow where step N+1 requires step N to have completed first. The skill BLOCKS until each target's transcript shows the next assistant turn finishing OR per-step timeout fires (default 300s).

10SKILL.mdUpdated May 31, 2026

stevengonsalvez/ainb-fleet:sequence

stevengonsalvez/ainb-fleet:needs

development

VerifiedTrustedCommunity

Center control panel — enumerate every claude session that is blocked waiting on something: a user answer (AskUserQuestion fired), an API error retry, an idle assistant turn-end with no follow-up, or an explicit WAITING: marker. Returns rich JSON with signal kind + context per session. Use this when you've stepped away from the fleet and want one place to see everything that wants your attention and answer it.

10SKILL.mdUpdated May 31, 2026

stevengonsalvez/ainb-fleet:needs

stevengonsalvez/ainb-fleet:fleet-needs

development

VerifiedTrustedCommunity

Workflow-backed Jarvis control panel. Runs the deterministic `hangar` workflow with verb=needs (discover → enrich → prioritize), renders the Jarvis HUD from its render-ready cards, fires AskUserQuestion per blocked session, and routes each answer back via tmux send-keys (broker fallback only). Requires the workflow gate (CLAUDE_CODE_WORKFLOWS=1). If the gate is off, fall back to the prompt-driven `/ainb-fleet:needs` skill.

10SKILL.mdUpdated May 31, 2026

stevengonsalvez/ainb-fleet:fleet-needs

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/stevengonsalvez/agents-in-a-box.git

# Copy into Claude Code skills folder (global)
cp -r agents-in-a-box/plugins/reflect/skills/cost ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

stevengonsalvez/agents-in-a-box

12 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT