Agent Communication Debugger

Debug and diagnose issues with the A2A (Agent-to-Agent) communication system, including the orchestrator, coder-agent, tester-agent, and message transport layers.

Prerequisites

A2A agent system located in a2a_communicating_agents/
Python 3.10+ environment
Access to agent logs in logs/ directory
Agent configurations in respective agent.json files

Instructions

1. Check Agent Status

First, determine which agents are running:

# Check all agent processes
ps aux | grep -E "(orchestrator|coder|tester|websocket)_agent|main.py" | grep -v grep

Look for:

orchestrator_agent/main.py
coder_agent/main.py
tester_agent/main.py
websocket_server.py

Common issues:

Agent process not found → Agent isn't running, needs to be started
Multiple instances → Duplicate processes causing conflicts

2. Inspect Agent Configurations

Read the agent configuration files to verify capabilities and topics:

# View orchestrator config
cat a2a_communicating_agents/orchestrator_agent/agent.json

# View coder agent config
cat a2a_communicating_agents/coder_agent/agent.json

# View tester agent config (if exists)
cat a2a_communicating_agents/tester_agent/agent.json

Verify:

Agent names match expected values
Topics are correctly defined
Capabilities describe what the agent does
No JSON syntax errors

3. Check Agent Logs

Examine logs for errors and message flow:

# View orchestrator logs (last 50 lines)
tail -50 logs/orchestrator.log

# View all logs with timestamps
tail -f logs/*.log

# Search for specific errors
grep -i "error\|exception\|failed" logs/*.log

# Check for routing decisions
grep -i "routing to\|routed to" logs/orchestrator.log

Look for:

Connection errors
Routing decisions showing wrong agent selection
JSON parsing errors
Message processing failures

4. Verify Message Transport

Check if the message transport (WebSocket or RAG board) is working:

# Check if WebSocket server is running
ps aux | grep websocket_server | grep -v grep
netstat -tlnp 2>/dev/null | grep 8765 || ss -tlnp 2>/dev/null | grep 8765

# Check RAG board storage
ls -lh a2a_communicating_agents/storage/
ls -lh storage/

# Check recent messages in message board
tail -20 storage/message_board.jsonl 2>/dev/null || echo "Message board not found"

Expected:

WebSocket server on port 8765 (if using WebSocket transport)
Recent messages in storage/message_board.jsonl (if using RAG transport)
No permission errors accessing storage

5. Test Message Sending

Use the provided test script to send a message and verify delivery:

# Send a test message to orchestrator
python .claude/skills/agent-debug/scripts/test_message.py

This script will:

Send a test message to the orchestrator topic
Wait for response
Show message delivery status
Display any responses received

6. Diagnose Routing Issues

If messages reach orchestrator but route to wrong agent:

Check orchestrator's routing logic:

# View the decide_route method
grep -A 50 "def decide_route" a2a_communicating_agents/orchestrator_agent/main.py

Check priority keyword mappings:

# View fallback routing keywords
grep -A 20 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py

Verify agent discovery:

# Check discovered agents in logs
grep "Discovered.*agents" logs/orchestrator.log | tail -5

Common routing issues:

Agent not discovered → Check agent.json exists and is valid
Wrong agent selected → Keywords don't match, update priority_mappings
Null target → No suitable agent found, check agent topics/capabilities

7. Check Environment Variables

Verify API keys and configuration:

# Check if OPENAI_API_KEY is set (don't display value)
env | grep -E "(OPENAI|API_KEY)" | sed 's/=.*/=***HIDDEN***/'

# Check model configuration
grep -E "(model|MODEL)" .env 2>/dev/null | sed 's/=.*/=***HIDDEN***/' || echo "No .env file"

Required environment variables:

OPENAI_API_KEY - For LLM-based routing and code generation
ORCHESTRATOR_MODEL or OPENAI_MODEL - Model to use (default: gpt-5-mini)
CODER_MODEL - Model for coder agent (optional, defaults to OPENAI_MODEL)

8. Restart Agents (if needed)

If agents are stuck or not responding:

# Stop all agents
pkill -f "orchestrator_agent/main.py"
pkill -f "coder_agent/main.py"
pkill -f "tester_agent/main.py"
pkill -f "websocket_server.py"

# Wait a moment
sleep 2

# Start WebSocket server (if using)
cd a2a_communicating_agents
nohup python agent_messaging/websocket_server.py > ../logs/websocket.log 2>&1 &

# Start orchestrator
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

# Start coder agent
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

# Verify they started
sleep 3
ps aux | grep -E "(orchestrator|coder|websocket)" | grep -v grep

9. Common Issues and Solutions

See common_issues.md for a detailed troubleshooting guide covering:

Messages not being delivered
Routing to wrong agent
Agent not generating responses
Duplicate message processing
Transport connectivity problems

Quick Diagnostic Checklist

Run through this checklist systematically:

[ ] All required agents are running (orchestrator, coder, tester)
[ ] WebSocket server is running (if using WebSocket transport)
[ ] Agent configuration files are valid JSON
[ ] Orchestrator discovered all agents (check logs)
[ ] OPENAI_API_KEY is set in environment
[ ] Recent log entries show activity
[ ] No Python exceptions in logs
[ ] Test message sends and receives successfully
[ ] Routing decisions select correct agent

Examples

Example 1: Agent Not Responding to Messages

User problem:

I'm sending messages to the orchestrator but getting no response

Debug workflow:

Check if orchestrator is running:
```
ps aux | grep orchestrator_agent | grep -v grep
```
Result: No process found → Orchestrator isn't running
Check logs for crash:
```
tail -50 logs/orchestrator.log
```
Result: ImportError for OpenAI package
Solution: Install missing dependency
```
pip install openai
```

Restart orchestrator:

cd a2a_communicating_agents
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

Verify it's running:

ps aux | grep orchestrator_agent | grep -v grep
tail -10 logs/orchestrator.log

Example 2: Messages Routing to Wrong Agent

User problem:

I asked for code but it routed to dashboard-agent instead of coder-agent

Debug workflow:

Check orchestrator discovered coder-agent:
```
grep "Discovered.*agents" logs/orchestrator.log | tail -1
```
Result: Shows coder-agent in list ✓
Check routing decision in logs:
```
grep -A 5 "please write.*code" logs/orchestrator.log
```
Result: Shows routing to dashboard-agent

Check routing logic:

grep -A 30 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py

Result: Keywords look correct

Check LLM routing decision:
```
grep "Error in decision making" logs/orchestrator.log
```
Result: LLM routing failed, falling back to heuristic

Check API key:

env | grep OPENAI_API_KEY | sed 's/=.*/=***HIDDEN***/'

Result: Variable not set

Solution: Set API key and restart orchestrator:

export OPENAI_API_KEY="your-key-here"
# Or add to .env file
echo "OPENAI_API_KEY=your-key-here" >> .env

Restart orchestrator to pick up new environment

Example 3: Coder Agent Acknowledges But Doesn't Generate Code

User problem:

Coder agent receives the message but only acknowledges, doesn't generate code

Debug workflow:

Check coder agent logs:
```
grep -i "generate\|code" logs/coder.log | tail -20
```
Result: "OpenAI package not available. Code generation will be limited."

Check if OpenAI is installed:

python -c "import openai; print(openai.__version__)" 2>&1

Result: ModuleNotFoundError

Install OpenAI package:
```
pip install openai
```

Restart coder agent:

pkill -f "coder_agent/main.py"
cd a2a_communicating_agents
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

Verify initialization:
```
grep "Initialized with model" logs/coder.log | tail -1
```
Result: Should show model name (e.g., gpt-5-mini)
Send test message and verify code generation

Example 4: Complete System Health Check

User request:

Run a complete diagnostic on the agent system

Complete diagnostic workflow:

Check all agents running:

echo "=== Agent Processes ==="
ps aux | grep -E "(orchestrator|coder|tester|websocket)" | grep -v grep

Check agent configs:

echo "=== Agent Configurations ==="
for agent in orchestrator_agent coder_agent tester_agent; do
  if [ -f "a2a_communicating_agents/$agent/agent.json" ]; then
    echo "--- $agent ---"
    cat "a2a_communicating_agents/$agent/agent.json" | python -m json.tool
  fi
done

Check environment:

echo "=== Environment Variables ==="
env | grep -E "(OPENAI|MODEL)" | sed 's/=.*/=***HIDDEN***/'

Check recent logs:

echo "=== Recent Log Activity ==="
tail -5 logs/*.log 2>/dev/null

Check for errors:

echo "=== Recent Errors ==="
grep -i "error\|exception" logs/*.log | tail -10

Test message sending:

echo "=== Message Transport Test ==="
python .claude/skills/agent-debug/scripts/test_message.py

Provide summary report with:
- Agent status (running/stopped)
- Configuration validity
- Environment completeness
- Recent error count
- Transport test result

Related Tools

orchestrator_chat.py - Interactive chat interface for testing
send_agent_message.py - Send messages programmatically
Agent start/stop scripts in a2a_communicating_agents/

Summary

This skill provides systematic debugging for the A2A agent communication system. Use it whenever:

Agents aren't communicating
Messages aren't being delivered
Routing is incorrect
System behavior is unexpected

Follow the diagnostic steps in order, checking status → configuration → logs → transport → routing. Most issues are:

Agent not running
Missing dependencies
Missing API keys
Invalid configurations
Routing logic issues

Start with the Quick Diagnostic Checklist and drill down based on what fails.

Agent Communication Debugger

Debug and diagnose issues with the A2A (Agent-to-Agent) communication system, including the orchestrator, coder-agent, tester-agent, and message transport layers.

Prerequisites

A2A agent system located in a2a_communicating_agents/
Python 3.10+ environment
Access to agent logs in logs/ directory
Agent configurations in respective agent.json files

Instructions

1. Check Agent Status

First, determine which agents are running:

# Check all agent processes
ps aux | grep -E "(orchestrator|coder|tester|websocket)_agent|main.py" | grep -v grep

Look for:

orchestrator_agent/main.py
coder_agent/main.py
tester_agent/main.py
websocket_server.py

Common issues:

Agent process not found → Agent isn't running, needs to be started
Multiple instances → Duplicate processes causing conflicts

2. Inspect Agent Configurations

Read the agent configuration files to verify capabilities and topics:

# View orchestrator config
cat a2a_communicating_agents/orchestrator_agent/agent.json

# View coder agent config
cat a2a_communicating_agents/coder_agent/agent.json

# View tester agent config (if exists)
cat a2a_communicating_agents/tester_agent/agent.json

Verify:

Agent names match expected values
Topics are correctly defined
Capabilities describe what the agent does
No JSON syntax errors

3. Check Agent Logs

Examine logs for errors and message flow:

# View orchestrator logs (last 50 lines)
tail -50 logs/orchestrator.log

# View all logs with timestamps
tail -f logs/*.log

# Search for specific errors
grep -i "error\|exception\|failed" logs/*.log

# Check for routing decisions
grep -i "routing to\|routed to" logs/orchestrator.log

Look for:

Connection errors
Routing decisions showing wrong agent selection
JSON parsing errors
Message processing failures

4. Verify Message Transport

Check if the message transport (WebSocket or RAG board) is working:

# Check if WebSocket server is running
ps aux | grep websocket_server | grep -v grep
netstat -tlnp 2>/dev/null | grep 8765 || ss -tlnp 2>/dev/null | grep 8765

# Check RAG board storage
ls -lh a2a_communicating_agents/storage/
ls -lh storage/

# Check recent messages in message board
tail -20 storage/message_board.jsonl 2>/dev/null || echo "Message board not found"

Expected:

WebSocket server on port 8765 (if using WebSocket transport)
Recent messages in storage/message_board.jsonl (if using RAG transport)
No permission errors accessing storage

5. Test Message Sending

Use the provided test script to send a message and verify delivery:

# Send a test message to orchestrator
python .claude/skills/agent-debug/scripts/test_message.py

This script will:

Send a test message to the orchestrator topic
Wait for response
Show message delivery status
Display any responses received

6. Diagnose Routing Issues

If messages reach orchestrator but route to wrong agent:

Check orchestrator's routing logic:

# View the decide_route method
grep -A 50 "def decide_route" a2a_communicating_agents/orchestrator_agent/main.py

Check priority keyword mappings:

# View fallback routing keywords
grep -A 20 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py

Verify agent discovery:

# Check discovered agents in logs
grep "Discovered.*agents" logs/orchestrator.log | tail -5

Common routing issues:

Agent not discovered → Check agent.json exists and is valid
Wrong agent selected → Keywords don't match, update priority_mappings
Null target → No suitable agent found, check agent topics/capabilities

7. Check Environment Variables

Verify API keys and configuration:

# Check if OPENAI_API_KEY is set (don't display value)
env | grep -E "(OPENAI|API_KEY)" | sed 's/=.*/=***HIDDEN***/'

# Check model configuration
grep -E "(model|MODEL)" .env 2>/dev/null | sed 's/=.*/=***HIDDEN***/' || echo "No .env file"

Required environment variables:

OPENAI_API_KEY - For LLM-based routing and code generation
ORCHESTRATOR_MODEL or OPENAI_MODEL - Model to use (default: gpt-5-mini)
CODER_MODEL - Model for coder agent (optional, defaults to OPENAI_MODEL)

8. Restart Agents (if needed)

If agents are stuck or not responding:

# Stop all agents
pkill -f "orchestrator_agent/main.py"
pkill -f "coder_agent/main.py"
pkill -f "tester_agent/main.py"
pkill -f "websocket_server.py"

# Wait a moment
sleep 2

# Start WebSocket server (if using)
cd a2a_communicating_agents
nohup python agent_messaging/websocket_server.py > ../logs/websocket.log 2>&1 &

# Start orchestrator
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

# Start coder agent
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

# Verify they started
sleep 3
ps aux | grep -E "(orchestrator|coder|websocket)" | grep -v grep

9. Common Issues and Solutions

See common_issues.md for a detailed troubleshooting guide covering:

Messages not being delivered
Routing to wrong agent
Agent not generating responses
Duplicate message processing
Transport connectivity problems

Quick Diagnostic Checklist

Run through this checklist systematically:

[ ] All required agents are running (orchestrator, coder, tester)
[ ] WebSocket server is running (if using WebSocket transport)
[ ] Agent configuration files are valid JSON
[ ] Orchestrator discovered all agents (check logs)
[ ] OPENAI_API_KEY is set in environment
[ ] Recent log entries show activity
[ ] No Python exceptions in logs
[ ] Test message sends and receives successfully
[ ] Routing decisions select correct agent

Examples

Example 1: Agent Not Responding to Messages

User problem:

I'm sending messages to the orchestrator but getting no response

Debug workflow:

Check if orchestrator is running:
```
ps aux | grep orchestrator_agent | grep -v grep
```
Result: No process found → Orchestrator isn't running
Check logs for crash:
```
tail -50 logs/orchestrator.log
```
Result: ImportError for OpenAI package
Solution: Install missing dependency
```
pip install openai
```

Restart orchestrator:

cd a2a_communicating_agents
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &

Verify it's running:

ps aux | grep orchestrator_agent | grep -v grep
tail -10 logs/orchestrator.log

Example 2: Messages Routing to Wrong Agent

User problem:

I asked for code but it routed to dashboard-agent instead of coder-agent

Debug workflow:

Check orchestrator discovered coder-agent:
```
grep "Discovered.*agents" logs/orchestrator.log | tail -1
```
Result: Shows coder-agent in list ✓
Check routing decision in logs:
```
grep -A 5 "please write.*code" logs/orchestrator.log
```
Result: Shows routing to dashboard-agent

Check routing logic:

grep -A 30 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py

Result: Keywords look correct

Check LLM routing decision:
```
grep "Error in decision making" logs/orchestrator.log
```
Result: LLM routing failed, falling back to heuristic

Check API key:

env | grep OPENAI_API_KEY | sed 's/=.*/=***HIDDEN***/'

Result: Variable not set

Solution: Set API key and restart orchestrator:

export OPENAI_API_KEY="your-key-here"
# Or add to .env file
echo "OPENAI_API_KEY=your-key-here" >> .env

Restart orchestrator to pick up new environment

Example 3: Coder Agent Acknowledges But Doesn't Generate Code

User problem:

Coder agent receives the message but only acknowledges, doesn't generate code

Debug workflow:

Check coder agent logs:
```
grep -i "generate\|code" logs/coder.log | tail -20
```
Result: "OpenAI package not available. Code generation will be limited."

Check if OpenAI is installed:

python -c "import openai; print(openai.__version__)" 2>&1

Result: ModuleNotFoundError

Install OpenAI package:
```
pip install openai
```

Restart coder agent:

pkill -f "coder_agent/main.py"
cd a2a_communicating_agents
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &

Verify initialization:
```
grep "Initialized with model" logs/coder.log | tail -1
```
Result: Should show model name (e.g., gpt-5-mini)
Send test message and verify code generation

Example 4: Complete System Health Check

User request:

Run a complete diagnostic on the agent system

Complete diagnostic workflow:

Check all agents running:

echo "=== Agent Processes ==="
ps aux | grep -E "(orchestrator|coder|tester|websocket)" | grep -v grep

Check agent configs:

echo "=== Agent Configurations ==="
for agent in orchestrator_agent coder_agent tester_agent; do
  if [ -f "a2a_communicating_agents/$agent/agent.json" ]; then
    echo "--- $agent ---"
    cat "a2a_communicating_agents/$agent/agent.json" | python -m json.tool
  fi
done

Check environment:

echo "=== Environment Variables ==="
env | grep -E "(OPENAI|MODEL)" | sed 's/=.*/=***HIDDEN***/'

Check recent logs:

echo "=== Recent Log Activity ==="
tail -5 logs/*.log 2>/dev/null

Check for errors:

echo "=== Recent Errors ==="
grep -i "error\|exception" logs/*.log | tail -10

Test message sending:

echo "=== Message Transport Test ==="
python .claude/skills/agent-debug/scripts/test_message.py

Provide summary report with:
- Agent status (running/stopped)
- Configuration validity
- Environment completeness
- Recent error count
- Transport test result

Related Tools

orchestrator_chat.py - Interactive chat interface for testing
send_agent_message.py - Send messages programmatically
Agent start/stop scripts in a2a_communicating_agents/

Summary

This skill provides systematic debugging for the A2A agent communication system. Use it whenever:

Agents aren't communicating
Messages aren't being delivered
Routing is incorrect
System behavior is unexpected

Follow the diagnostic steps in order, checking status → configuration → logs → transport → routing. Most issues are:

Agent not running
Missing dependencies
Missing API keys
Invalid configurations
Routing logic issues

Start with the Quick Diagnostic Checklist and drill down based on what fails.

Adoption

aiskillstore/agent-communication-debugger

$ install --global

Security Scan Results

SKILL.md

Agent Communication Debugger

Prerequisites

Instructions

1. Check Agent Status

2. Inspect Agent Configurations

3. Check Agent Logs

4. Verify Message Transport

5. Test Message Sending

6. Diagnose Routing Issues

7. Check Environment Variables

8. Restart Agents (if needed)

9. Common Issues and Solutions

Quick Diagnostic Checklist

Examples

Example 1: Agent Not Responding to Messages

Example 2: Messages Routing to Wrong Agent

Example 3: Coder Agent Acknowledges But Doesn't Generate Code

Example 4: Complete System Health Check

Related Tools

Summary

Related Skills

aiskillstore/hig-components-content

aiskillstore/helpdesk-automation

aiskillstore/haskell-pro

aiskillstore/graphql

aiskillstore/agent-communication-debugger

$ install --global

Security Scan Results

SKILL.md

Agent Communication Debugger

Prerequisites

Instructions

1. Check Agent Status

2. Inspect Agent Configurations

3. Check Agent Logs

4. Verify Message Transport

5. Test Message Sending

6. Diagnose Routing Issues

7. Check Environment Variables

8. Restart Agents (if needed)

9. Common Issues and Solutions

Quick Diagnostic Checklist

Examples

Example 1: Agent Not Responding to Messages

Example 2: Messages Routing to Wrong Agent

Example 3: Coder Agent Acknowledges But Doesn't Generate Code

Example 4: Complete System Health Check

Related Tools

Summary

Related Skills

aiskillstore/hig-components-content

aiskillstore/helpdesk-automation

aiskillstore/haskell-pro

aiskillstore/graphql