.claude/skills/agent-health/SKILL.md
Reads production/traces/agent-metrics.jsonl and displays a per-agent performance summary table for the current or a specified session. Highlights agents with high error rates or OPEN circuit breaker state.
npx skillsauth add tranhieutt/software_development_department agent-healthInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Display a performance summary table from production/traces/agent-metrics.jsonl,
cross-referenced with production/session-state/circuit-state.json for live
circuit breaker states.
| Flag | Default | Description |
| :--- | :--- | :--- |
| --session <branch> | current branch | Filter entries by session field |
| --agent <name> | all | Show only this agent |
| --since <date> | no limit | Only entries with date >= YYYY-MM-DD |
| --log | false | If set, append a fresh metrics snapshot to agent-metrics.jsonl |
Get current branch: git branch --show-current.
Read both files in parallel:
production/traces/agent-metrics.jsonl — historical metrics per agent per sessionproduction/session-state/circuit-state.json — live circuit breaker statesIf agent-metrics.jsonl contains only the schema header line (no actual entries):
📭 No agent metrics recorded yet for this session.
Metrics are written when agents use /agent-health --log
or at the end of a session via /save-state.
Circuit breaker states (live):
[show table from circuit-state.json only]
For each agent, compute across the filtered entries:
total_tasks = tasks_completed + tasks_failed + tasks_blockedsuccess_rate = tasks_completed / total_tasks * 100 (0 if no tasks)error_rate = latest error_rate field valuecircuit_state = from circuit-state.json (live, not from log)🏥 Agent Health Report — session: <branch> · <date range>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Agent Tasks ✅ Done ❌ Failed ⛔ Blocked Success% Circuit
──────────────────────────────────────────────────────────────────────────────
backend-developer 8 7 1 0 87.5% 🟢 CLOSED
frontend-developer 5 5 0 0 100.0% 🟢 CLOSED
qa-tester 6 4 2 0 66.7% 🟡 HALF-OPEN
data-engineer 2 2 0 0 100.0% 🟢 CLOSED
investigator 1 0 1 0 0.0% 🔴 OPEN
──────────────────────────────────────────────────────────────────────────────
TOTAL 22 18 4 0 81.8%
⚠️ Agents needing attention:
🔴 investigator — Circuit OPEN · fallback: solver
🟡 qa-tester — Circuit HALF-OPEN · 2 failures this session
Circuit state icons:
🟢 CLOSED — healthy🟡 HALF-OPEN — recovering, monitor closely🔴 OPEN — bypassed, routed to fallbackFlag agents as needing attention if:
circuit_state is OPEN or HALF-OPENsuccess_rate < 70%tasks_failed >= 2If --log flag was passed, append one entry per active agent to
production/traces/agent-metrics.jsonl:
{"date":"<YYYY-MM-DD>","session":"<branch>","agent":"<agent>","tasks_completed":<N>,"tasks_failed":<N>,"tasks_blocked":<N>,"avg_tokens_est":<N>,"error_rate":<0.0-1.0>,"circuit_state":"CLOSED|OPEN|HALF-OPEN","notes":"<optional>"}
Get circuit_state from circuit-state.json. Estimate avg_tokens_est from
decision ledger entry count × 800 tokens (rough estimate per entry) if no exact
token data is available. Note this is an estimate and mark with _est suffix.
Print after logging:
✅ Metrics snapshot logged → production/traces/agent-metrics.jsonl
[N] agents recorded · <date>
After the table, if any agents need attention:
💡 Suggested actions:
• /resume-from <task_id> — recover failed task checkpoint
• /trace-history --risk High — audit high-risk decisions
• Check circuit-state.json — update OPEN agents once issue resolved
Agents append entries in two ways:
/agent-health --log at end of session/save-state: When saving state with a task_id, metrics for the
active agent are appended automaticallyThe file grows one JSON line per agent per session. Use --since to filter
to recent sessions and avoid reading stale data from weeks ago.
# Summary for current session
/agent-health
# Check one agent across all time
/agent-health --agent qa-tester
# Log a fresh snapshot and view it
/agent-health --log
# Review last 7 days
/agent-health --since 2026-04-09
testing
Generates high-fidelity architecture diagrams, sequence flows, and component maps for SDD projects. Use when finalizing a design phase, documenting system architecture, or visualizing agentic workflows. Default style: Style 6 (Claude Official).
data-ai
Provides vector database and semantic search patterns for Pinecone, Weaviate, Qdrant, Milvus, and pgvector in RAG and recommendation systems. Use when implementing vector search or when the user mentions vector database, semantic search, embeddings, or similarity search.
development
Updates docs/technical/CODEMAP.md by scanning the current codebase structure. Run after a significant feature merge, refactor, or when CODEMAP feels stale.
development
Unlocks the codebase after a release freeze or incident freeze period to resume normal development. Use when a freeze period ends or when the user mentions unfreezing or lifting the code freeze.