SKILLS/HIVE FRAMEWORK/hive-debugger/SKILL.md
Interactive debugging companion for Hive agents - identifies runtime issues and proposes solutions
npx skillsauth add mattmre/evokore-mcp hive-debuggerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
An interactive debugging companion that helps developers identify and fix runtime issues in Hive agents. The debugger analyzes runtime logs at three levels (L1/L2/L3), categorizes issues, and provides actionable fix recommendations.
Use /hive-debugger when:
This skill works alongside agents running in TUI mode and provides supervisor-level insights into execution behavior.
Some agents use terminal_nodes=[] (the "forever-alive" pattern), meaning they loop indefinitely and never enter a "completed" execution state. For these agents:
How to identify forever-alive agents: Check agent.py or agent.json for terminal_nodes=[] (empty list). If empty, the agent is forever-alive.
Before using this skill, ensure:
exports/{agent_name}/~/.hive/agents/{agent_name}/Objective: Understand the agent being debugged
What to do:
Ask the developer which agent needs debugging:
exports/{agent_name}/Determine agent working directory:
~/.hive/agents/{agent_name}/Read agent configuration:
exports/{agent_name}/agent.jsongoal.id - The goal identifiergoal.success_criteria - What success looks likegoal.constraints - Rules the agent must followgraph.nodesgraph.edgesStore context for the debugging session:
/home/user/.hive/deep_research_agent)Example:
Developer: "My deep_research_agent agent keeps failing"
You: "I'll help debug the deep_research_agent agent. Let me gather context..."
[Read exports/deep_research_agent/agent.json]
Context gathered:
- Agent: deep_research_agent
- Goal: deep-research
- Working Directory: /home/user/.hive/deep_research_agent
- Success Criteria: ["Produce a comprehensive research report with cited sources"]
- Constraints: ["Must cite all sources", "Must cover multiple perspectives"]
- Nodes: ["intake", "research", "analysis", "report-writer"]
Objective: Choose the debugging approach that best fits the situation
What to do:
Ask the developer which debugging mode they want to use. Use AskUserQuestion with these options:
Real-time Monitoring Mode
Post-Mortem Analysis Mode
Historical Trends Mode
Implementation:
Use AskUserQuestion to present these options and let the developer choose.
Store the selected mode for the session.
Objective: Identify which sessions need attention
What to do:
Query high-level run summaries using the MCP tool:
query_runtime_logs(
agent_work_dir="{agent_work_dir}",
status="needs_attention",
limit=20
)
Analyze the results:
needs_attention: trueattention_summary.categories for issue typesrun_id of problematic sessionsstatus field: "degraded", "failure", "in_progress"needs_attention: true or actual error indicators (tool failures, retry loops, missing outputs). High step counts alone do not indicate a problem.Attention flag triggers to understand: From runtime_logger.py, runs are flagged when:
Present findings to developer:
Example Output:
Found 2 runs needing attention:
1. session_20260206_115718_e22339c5 (30 minutes ago)
Status: degraded
Categories: missing_outputs, retry_loops
2. session_20260206_103422_9f8d1b2a (2 hours ago)
Status: failure
Categories: tool_failures, high_latency
Which run would you like to investigate?
Objective: Identify which nodes failed and what patterns exist
What to do:
Query per-node details using the MCP tool:
query_runtime_log_details(
agent_work_dir="{agent_work_dir}",
run_id="{selected_run_id}",
needs_attention_only=True
)
Categorize issues using the Issue Taxonomy:
10 Issue Categories:
| Category | Detection Pattern | Meaning |
|----------|------------------|---------|
| Missing Outputs | exit_status != "success", attention_reasons contains "missing_outputs" | Node didn't call set_output with required keys |
| Tool Errors | tool_error_count > 0, attention_reasons contains "tool_failures" | Tool calls failed (API errors, timeouts, auth issues) |
| Retry Loops | retry_count > 3, verdict_counts.RETRY > 5 | Judge repeatedly rejecting outputs |
| Guard Failures | guard_reject_count > 0 | Output validation failed (wrong types, missing keys) |
| Stalled Execution | total_steps > 20, verdict_counts.CONTINUE > 10 | EventLoopNode not making progress. Caveat: Forever-alive agents may legitimately have high step counts — check if agent is blocked at a client-facing node (normal) vs genuinely stuck in a loop |
| High Latency | latency_ms > 60000, avg_step_latency > 5000 | Slow tool calls or LLM responses |
| Client-Facing Issues | client_input_requested but no user_input_received | Premature set_output before user input |
| Edge Routing Errors | exit_status == "no_valid_edge", attention_reasons contains "routing_issue" | No edges match current state |
| Memory/Context Issues | tokens_used > 100000, context_overflow_count > 0 | Conversation history too long |
| Constraint Violations | Compare output against goal constraints | Agent violated goal-level rules |
Forever-Alive Agent Caveat: If the agent uses terminal_nodes=[], sessions will never reach "completed" status. This is by design. When debugging these agents, focus on:
Analyze each flagged node:
Present diagnosis to developer:
Example Output:
Diagnosis for session_20260206_115718_e22339c5:
Problem Node: research
├─ Exit Status: escalate
├─ Retry Count: 5 (HIGH)
├─ Verdict Counts: {RETRY: 5, ESCALATE: 1}
├─ Attention Reasons: ["high_retry_count", "missing_outputs"]
├─ Total Steps: 8
└─ Categories: Missing Outputs + Retry Loops
Root Issue: The research node is stuck in a retry loop because it's not setting required outputs.
Objective: Understand exactly what went wrong by examining detailed logs
What to do:
Query detailed tool/LLM logs using the MCP tool:
query_runtime_log_raw(
agent_work_dir="{agent_work_dir}",
run_id="{run_id}",
node_id="{problem_node_id}"
)
Analyze based on issue category:
For Missing Outputs:
step.tool_calls for set_output usageFor Tool Errors:
step.tool_results for error messagesFor Retry Loops:
step.verdict_feedback from judgeFor Guard Failures:
step.guard_results for validation errorsFor Stalled Execution:
step.llm_response_text for repetitionExtract evidence:
Formulate root cause explanation:
Example Output:
Root Cause Analysis for research:
Step-by-step breakdown:
Step 3:
- Tool Call: web_search(query="latest AI regulations 2026")
- Result: Found relevant articles and sources
- Verdict: RETRY
- Feedback: "Missing required output 'research_findings'. You found sources but didn't call set_output."
Step 4:
- Tool Call: web_search(query="AI regulation policy 2026")
- Result: Found additional policy information
- Verdict: RETRY
- Feedback: "Still missing 'research_findings'. Use set_output to save your findings."
Steps 5-7: Similar pattern continues...
ROOT CAUSE: The node is successfully finding research sources via web_search, but the LLM is not calling set_output to save the results. It keeps searching for more information instead of completing the task.
Objective: Provide actionable solutions the developer can implement
What to do:
Based on the issue category identified, provide specific fix recommendations using these templates:
## Issue: Premature set_output in Client-Facing Node
**Root Cause:** Node called set_output before receiving user input
**Fix:** Use STEP 1/STEP 2 prompt pattern
**File to edit:** `exports/{agent_name}/nodes/{node_name}.py`
**Changes:**
1. Update the system_prompt to include explicit step guidance:
```python
system_prompt = """
STEP 1: Analyze the user input and decide what action to take.
DO NOT call set_output in this step.
STEP 2: After receiving feedback or completing analysis,
ONLY THEN call set_output with your results.
"""
nullable_output_keys=["feedback"]
Verification:
#### Template 2: Retry Loops
```markdown
## Issue: Judge Repeatedly Rejecting Outputs
**Root Cause:** {Insert specific reason from verdict_feedback}
**Fix Options:**
**Option A - If outputs are actually correct:** Adjust judge evaluation rules
- File: `exports/{agent_name}/agent.json`
- Update `evaluation_rules` section to accept the current output format
- Example: If judge expects list but gets string, update rule to accept both
**Option B - If prompt is ambiguous:** Clarify node instructions
- File: `exports/{agent_name}/nodes/{node_name}.py`
- Make system_prompt more explicit about output format and requirements
- Add examples of correct outputs
**Option C - If tool is unreliable:** Add retry logic with fallback
- Consider using alternative tools
- Add manual fallback option
- Update prompt to handle tool failures gracefully
**Verification:**
- Run the node with test input
- Confirm judge accepts output on first try
- Check that retry_count stays at 0
## Issue: {tool_name} Failing with {error_type}
**Root Cause:** {Insert specific error message from logs}
**Fix Strategy:**
**If API rate limit:**
1. Add exponential backoff in tool retry logic
2. Reduce API call frequency
3. Consider caching results
**If auth failure:**
1. Check credentials using:
```bash
/hive-credentials --agent {agent_name}
mcp_servers.json if neededIf timeout:
mcp_servers.json:
{
"timeout_ms": 60000
}
Verification:
#### Template 4: Edge Routing Errors
```markdown
## Issue: No Valid Edge from Node {node_id}
**Root Cause:** No edge condition matched the current state
**File to edit:** `exports/{agent_name}/agent.json`
**Analysis:**
- Current node output: {show actual output keys}
- Existing edge conditions: {list edge conditions}
- Why no match: {explain the mismatch}
**Fix:**
Add the missing edge to the graph:
```json
{
"edge_id": "{node_id}_to_{target_node}",
"source": "{node_id}",
"target": "{target_node}",
"condition": "on_success"
}
Alternative: Update existing edge condition to cover this case
Verification:
#### Template 5: Stalled Execution
```markdown
## Issue: EventLoopNode Not Making Progress
**Root Cause:** {Insert analysis - e.g., "LLM repeating same failed action"}
**File to edit:** `exports/{agent_name}/nodes/{node_name}.py`
**Fix:** Update system_prompt to guide LLM out of loops
**Add this guidance:**
```python
system_prompt = """
{existing prompt}
IMPORTANT: If a tool call fails multiple times:
1. Try an alternative approach or different tool
2. If no alternatives work, call set_output with partial results
3. DO NOT retry the same failed action more than 3 times
Progress is more important than perfection. Move forward even with incomplete data.
"""
Additional fix: Lower max_iterations to prevent infinite loops
# In node configuration
max_node_visits=3 # Prevent getting stuck
Verification:
#### Template 6: Checkpoint Recovery (Post-Fix Resume)
```markdown
## Recovery Strategy: Resume from Last Clean Checkpoint
**Situation:** You've fixed the issue, but the failed session is stuck mid-execution
**Solution:** Resume execution from a checkpoint before the failure
### Option A: Auto-Resume from Latest Checkpoint (Recommended)
Use CLI arguments to auto-resume when launching TUI:
```bash
PYTHONPATH=core:exports python -m {agent_name} --tui \
--resume-session {session_id}
This will:
state.jsonIf you need to go back to an earlier point:
PYTHONPATH=core:exports python -m {agent_name} --tui \
--resume-session {session_id} \
--checkpoint {checkpoint_id}
Example:
PYTHONPATH=core:exports python -m deep_research_agent --tui \
--resume-session session_20260208_143022_abc12345 \
--checkpoint cp_node_complete_intake_143030
Alternatively, launch TUI normally and use commands:
# Launch TUI
PYTHONPATH=core:exports python -m {agent_name} --tui
# In TUI, use commands:
/resume {session_id} # Resume from session state
/recover {session_id} {checkpoint_id} # Recover from specific checkpoint
Use /resume (or --resume-session) when:
Use /recover (or --resume-session + --checkpoint) when:
Use MCP tools to programmatically find and inspect checkpoints:
# List all sessions to find the failed one
list_agent_sessions(agent_work_dir="~/.hive/agents/{agent_name}", status="failed")
# Inspect session state
get_agent_session_state(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}")
# Find clean checkpoints to resume from
list_agent_checkpoints(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}", is_clean="true")
# Compare checkpoints to understand what changed
compare_agent_checkpoints(
agent_work_dir="~/.hive/agents/{agent_name}",
session_id="{session_id}",
checkpoint_id_before="cp_node_complete_intake_143030",
checkpoint_id_after="cp_node_complete_research_143115"
)
# Inspect memory at a specific checkpoint
get_agent_checkpoint(agent_work_dir="~/.hive/agents/{agent_name}", session_id="{session_id}", checkpoint_id="cp_node_complete_intake_143030")
Or in TUI:
/sessions {session_id}
Verification:
--resume-session to test your fix immediately
**Selecting the right template:**
- Match the issue category from Stage 4
- Customize with specific details from Stage 5
- Include actual error messages and code snippets
- Provide file paths and line numbers when possible
- **Always include recovery commands** (Template 6) after providing fix recommendations
---
### Stage 7: Verification Support
**Objective:** Help the developer confirm their fixes work
**What to do:**
1. **Suggest appropriate tests based on fix type:**
**For node-level fixes:**
```bash
# Use hive-test to run goal-based tests
/hive-test --agent {agent_name} --goal {goal_id}
# Or run specific test scenarios
/hive-test --agent {agent_name} --scenario {specific_input}
For quick manual tests:
# Launch the interactive TUI dashboard
hive tui
Then use arrow keys to select the agent from the list and press Enter to run it.
Provide MCP tool queries to validate the fix:
Check if issue is resolved:
query_runtime_logs(
agent_work_dir="~/.hive/agents/{agent_name}",
status="needs_attention",
limit=5
)
# Should show 0 results if fully fixed
Verify specific node behavior:
query_runtime_log_details(
agent_work_dir="~/.hive/agents/{agent_name}",
run_id="{new_run_id}",
node_id="{fixed_node_id}"
)
# Should show exit_status="success", retry_count=0
Monitor for regression:
Provide verification checklist:
Verification Checklist:
□ Applied recommended fix to code
□ Ran agent with test input
□ Checked runtime logs show no attention flags
□ Verified specific node completes successfully
□ Tested with multiple inputs
□ No regression of original issue
□ Agent meets success criteria
Example interaction:
Developer: "I applied the fix to research. How do I verify it works?"
You: "Great! Let's verify the fix with these steps:
1. Launch the TUI dashboard:
hive tui
Then select your agent from the list and press Enter to run it.
2. After it completes, check the logs:
[Use query_runtime_logs to check for attention flags]
3. Verify the specific node:
[Use query_runtime_log_details for research]
Expected results:
- No 'needs_attention' flags
- research shows exit_status='success'
- retry_count should be 0
Let me know when you've run it and I'll help check the logs!"
L1: query_runtime_logs - Session-level summaries
query_runtime_logs(
agent_work_dir="/home/user/.hive/deep_research_agent",
status="needs_attention",
limit=20
)
L2: query_runtime_log_details - Node-level details
query_runtime_log_details(
agent_work_dir="/home/user/.hive/deep_research_agent",
run_id="session_20260206_115718_e22339c5",
needs_attention_only=True
)
L3: query_runtime_log_raw - Step-level details
query_runtime_log_raw(
agent_work_dir="/home/user/.hive/deep_research_agent",
run_id="session_20260206_115718_e22339c5",
node_id="research"
)
list_agent_sessions - Browse sessions with filtering
list_agent_sessions(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
status="failed",
limit=10
)
get_agent_session_state - Load full session state (excludes memory values)
get_agent_session_state(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345"
)
get_agent_session_memory - Get memory contents from a session
get_agent_session_memory(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345",
key="twitter_handles"
)
list_agent_checkpoints - List checkpoints for a session
list_agent_checkpoints(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345",
is_clean="true"
)
get_agent_checkpoint - Load a specific checkpoint with full state
get_agent_checkpoint(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345",
checkpoint_id="cp_node_complete_intake_143030"
)
compare_agent_checkpoints - Diff memory between two checkpoints
compare_agent_checkpoints(
agent_work_dir="/home/user/.hive/agents/twitter_outreach",
session_id="session_20260208_143022_abc12345",
checkpoint_id_before="cp_node_complete_intake_143030",
checkpoint_id_after="cp_node_complete_research_143115"
)
Pattern 1: Top-Down Investigation (Most common)
1. L1: Find problematic runs
2. L2: Identify failing nodes
3. L3: Analyze specific failures
Pattern 2: Node-Specific Debugging
1. L2: Get details for specific node across all runs
2. L3: Deep dive into worst failures
Pattern 3: Real-time Monitoring
Loop every 10 seconds:
1. L1: Check for new needs_attention runs
2. If found: Alert and drill into L2
Pattern 4: Session State + Checkpoint Recovery
1. list_agent_sessions: Find failed/paused sessions
2. get_agent_session_state: Check is_resumable, see execution path
3. get_agent_session_memory: Inspect what data was produced
4. list_agent_checkpoints: Find clean checkpoints before failure
5. compare_agent_checkpoints: Understand what changed between checkpoints
6. Recommend resume command with specific checkpoint
Developer says: "My deep_research_agent keeps failing. It's not saving research findings correctly."
You: "I'll help debug the deep_research_agent agent. Let me gather context..."
[Read exports/deep_research_agent/agent.json]
Context:
- Agent: deep_research_agent
- Goal: deep-research
- Working Dir: ~/.hive/agents/deep_research_agent
- Success Criteria: ["Successfully send 5 personalized outreach messages"]
- Constraints: ["Must verify handle exists", "Must personalize message"]
- Nodes: research, profile-analyzer, message-composer, outreach-sender
You: "Which debugging approach would you prefer?"
[Present 3 mode options using AskUserQuestion]
Developer selects: "Post-Mortem Analysis Mode"
You: "Querying runtime logs for problematic runs..."
[Call query_runtime_logs with status="needs_attention"]
Results:
- Found 1 run needing attention
- session_20260206_115718_e22339c5 (30 min ago)
- Status: degraded
- Categories: missing_outputs, retry_loops
You: "Analyzing node-level details..."
[Call query_runtime_log_details for the session]
Results:
- Problem Node: research
- Exit Status: escalate
- Retry Count: 5
- Verdict Counts: {RETRY: 5, ESCALATE: 1}
- Attention Reasons: ["high_retry_count", "missing_outputs"]
- Total Steps: 8
Diagnosis: research has retry loop + missing outputs issue
Category: Missing Outputs + Retry Loops
You: "Drilling into detailed logs for research..."
[Call query_runtime_log_raw for node research]
Evidence:
Step 3: web_search("latest AI regulations 2026") → Found articles
Verdict: RETRY
Feedback: "Missing required output 'research_findings'. You found sources but didn't call set_output."
Step 4: web_search("AI regulation policy 2026") → Found more info
Verdict: RETRY
Feedback: "Still missing 'research_findings'. Use set_output to save your findings."
Steps 5-7: Same pattern repeats...
ROOT CAUSE: Node successfully finds handles but LLM doesn't call set_output. It keeps searching instead of completing.
You: "Here's the fix for this issue:
## Issue: research Node Not Setting Outputs
**Root Cause:** Node successfully finds research sources but doesn't call set_output
**Category:** Missing Outputs
**File to edit:** exports/deep_research_agent/nodes/research.py
**Changes:**
Update the system_prompt to emphasize the output requirement:
```python
system_prompt = """
You are the Research node for the deep research agent.
Your task:
1. Search for relevant sources on the research topic
2. Extract key findings from each source
3. **CRITICAL: Call set_output(research_findings=[...]) immediately after gathering sufficient findings**
IMPORTANT: You MUST call set_output even if you found findings on the first search.
Do not continue searching indefinitely - set output and finish once you have enough data.
Output format:
{
"research_findings": [{"source": "...", "summary": "...", "url": "..."}, ...]
}
"""
Why this works: The explicit instruction to call set_output immediately prevents the LLM from continuing to search after finding sufficient sources.
#### Stage 7: Verification
Developer: "I updated the prompt. How do I test it?"
You: "Let's verify the fix:
Launch the TUI dashboard: hive tui Then select your agent and press Enter to run it.
I'll check the logs after it runs..."
[Developer runs agent]
Developer: "It completed! Can you check if it worked?"
You: "Checking now..."
[Call query_runtime_log_details for the new run]
Results:
"Perfect! The fix worked. The research now:
Your agent should now work correctly!"
---
## Tips for Effective Debugging
1. **Always start with L1 logs** - Don't jump straight to detailed logs
2. **Focus on attention flags** - They highlight the real issues
3. **Compare verdict_feedback across steps** - Patterns reveal root causes
4. **Check tool error messages carefully** - They often contain the exact problem
5. **Consider the agent's goal** - Fixes should align with success criteria
6. **Test fixes immediately** - Quick verification prevents wasted effort
7. **Look for patterns across multiple runs** - One-time failures might be transient
## Common Pitfalls to Avoid
1. **Don't recommend code you haven't verified exists** - Always read files first
2. **Don't assume tool capabilities** - Check MCP server configs
3. **Don't ignore edge conditions** - Missing edges cause routing failures
4. **Don't overlook judge configuration** - Mismatched expectations cause retry loops
5. **Don't forget nullable_output_keys** - Optional inputs need explicit marking
6. **Don't diagnose "in_progress" as a failure for forever-alive agents** - Agents with `terminal_nodes=[]` are designed to never enter "completed" state. This is intentional. Focus on quality of individual node visits, not session completion status
7. **Don't ignore conversation memory issues in long-running sessions** - In continuous conversation mode, history grows across node transitions and loop iterations. Watch for context overflow (tokens_used > 100K), stale data from previous loops affecting edge conditions, and compaction failures that cause the LLM to lose important context
8. **Don't confuse "waiting for user" with "stalled"** - Client-facing nodes in forever-alive agents block for user input by design. A session paused at a client-facing node is working correctly, not stalled
---
## Storage Locations Reference
**New unified storage (default):**
- Logs: `~/.hive/agents/{agent_name}/sessions/session_YYYYMMDD_HHMMSS_{uuid}/logs/`
- State: `~/.hive/agents/{agent_name}/sessions/{session_id}/state.json`
- Conversations: `~/.hive/agents/{agent_name}/sessions/{session_id}/conversations/`
**Old storage (deprecated, still supported):**
- Logs: `~/.hive/agents/{agent_name}/runtime_logs/runs/{run_id}/`
The MCP tools automatically check both locations.
---
**Remember:** Your role is to be a debugging companion and thought partner. Guide the developer through the investigation, explain what you find, and provide actionable fixes. Don't just report errors - help understand and solve them.
development
Core orchestration framework for model-agnostic multi-agent workflows with handoff protocol, policy governance, and configuration schemas
testing
Specialized skill for triage issue skill workflows.
development
Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates hive-* skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
development
Iterative agent testing with session recovery. Execute, analyze, fix, resume from checkpoints. Use when testing an agent, debugging test failures, or verifying fixes without re-running from scratch.