skills/egadams/agent-communication-debugger/SKILL.md
Diagnoses and debugs A2A agent communication issues including agent status, message routing, transport connectivity, and log analysis. Use when agents aren't responding, messages aren't being delivered, routing is incorrect, or when debugging orchestrator, coder-agent, tester-agent communication problems.
npx skillsauth add aiskillstore/marketplace agent-communication-debuggerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Debug and diagnose issues with the A2A (Agent-to-Agent) communication system, including the orchestrator, coder-agent, tester-agent, and message transport layers.
a2a_communicating_agents/logs/ directoryagent.json filesFirst, determine which agents are running:
# Check all agent processes
ps aux | grep -E "(orchestrator|coder|tester|websocket)_agent|main.py" | grep -v grep
Look for:
orchestrator_agent/main.pycoder_agent/main.pytester_agent/main.pywebsocket_server.pyCommon issues:
Read the agent configuration files to verify capabilities and topics:
# View orchestrator config
cat a2a_communicating_agents/orchestrator_agent/agent.json
# View coder agent config
cat a2a_communicating_agents/coder_agent/agent.json
# View tester agent config (if exists)
cat a2a_communicating_agents/tester_agent/agent.json
Verify:
Examine logs for errors and message flow:
# View orchestrator logs (last 50 lines)
tail -50 logs/orchestrator.log
# View all logs with timestamps
tail -f logs/*.log
# Search for specific errors
grep -i "error\|exception\|failed" logs/*.log
# Check for routing decisions
grep -i "routing to\|routed to" logs/orchestrator.log
Look for:
Check if the message transport (WebSocket or RAG board) is working:
# Check if WebSocket server is running
ps aux | grep websocket_server | grep -v grep
netstat -tlnp 2>/dev/null | grep 8765 || ss -tlnp 2>/dev/null | grep 8765
# Check RAG board storage
ls -lh a2a_communicating_agents/storage/
ls -lh storage/
# Check recent messages in message board
tail -20 storage/message_board.jsonl 2>/dev/null || echo "Message board not found"
Expected:
Use the provided test script to send a message and verify delivery:
# Send a test message to orchestrator
python .claude/skills/agent-debug/scripts/test_message.py
This script will:
If messages reach orchestrator but route to wrong agent:
Check orchestrator's routing logic:
# View the decide_route method
grep -A 50 "def decide_route" a2a_communicating_agents/orchestrator_agent/main.py
Check priority keyword mappings:
# View fallback routing keywords
grep -A 20 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py
Verify agent discovery:
# Check discovered agents in logs
grep "Discovered.*agents" logs/orchestrator.log | tail -5
Common routing issues:
Verify API keys and configuration:
# Check if OPENAI_API_KEY is set (don't display value)
env | grep -E "(OPENAI|API_KEY)" | sed 's/=.*/=***HIDDEN***/'
# Check model configuration
grep -E "(model|MODEL)" .env 2>/dev/null | sed 's/=.*/=***HIDDEN***/' || echo "No .env file"
Required environment variables:
OPENAI_API_KEY - For LLM-based routing and code generationORCHESTRATOR_MODEL or OPENAI_MODEL - Model to use (default: gpt-5-mini)CODER_MODEL - Model for coder agent (optional, defaults to OPENAI_MODEL)If agents are stuck or not responding:
# Stop all agents
pkill -f "orchestrator_agent/main.py"
pkill -f "coder_agent/main.py"
pkill -f "tester_agent/main.py"
pkill -f "websocket_server.py"
# Wait a moment
sleep 2
# Start WebSocket server (if using)
cd a2a_communicating_agents
nohup python agent_messaging/websocket_server.py > ../logs/websocket.log 2>&1 &
# Start orchestrator
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &
# Start coder agent
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &
# Verify they started
sleep 3
ps aux | grep -E "(orchestrator|coder|websocket)" | grep -v grep
See common_issues.md for a detailed troubleshooting guide covering:
Run through this checklist systematically:
User problem:
I'm sending messages to the orchestrator but getting no response
Debug workflow:
Check if orchestrator is running:
ps aux | grep orchestrator_agent | grep -v grep
Result: No process found → Orchestrator isn't running
Check logs for crash:
tail -50 logs/orchestrator.log
Result: ImportError for OpenAI package
Solution: Install missing dependency
pip install openai
Restart orchestrator:
cd a2a_communicating_agents
nohup python orchestrator_agent/main.py > ../logs/orchestrator.log 2>&1 &
Verify it's running:
ps aux | grep orchestrator_agent | grep -v grep
tail -10 logs/orchestrator.log
User problem:
I asked for code but it routed to dashboard-agent instead of coder-agent
Debug workflow:
Check orchestrator discovered coder-agent:
grep "Discovered.*agents" logs/orchestrator.log | tail -1
Result: Shows coder-agent in list ✓
Check routing decision in logs:
grep -A 5 "please write.*code" logs/orchestrator.log
Result: Shows routing to dashboard-agent
Check routing logic:
grep -A 30 "priority_mappings = {" a2a_communicating_agents/orchestrator_agent/main.py
Result: Keywords look correct
Check LLM routing decision:
grep "Error in decision making" logs/orchestrator.log
Result: LLM routing failed, falling back to heuristic
Check API key:
env | grep OPENAI_API_KEY | sed 's/=.*/=***HIDDEN***/'
Result: Variable not set
Solution: Set API key and restart orchestrator:
export OPENAI_API_KEY="your-key-here"
# Or add to .env file
echo "OPENAI_API_KEY=your-key-here" >> .env
Restart orchestrator to pick up new environment
User problem:
Coder agent receives the message but only acknowledges, doesn't generate code
Debug workflow:
Check coder agent logs:
grep -i "generate\|code" logs/coder.log | tail -20
Result: "OpenAI package not available. Code generation will be limited."
Check if OpenAI is installed:
python -c "import openai; print(openai.__version__)" 2>&1
Result: ModuleNotFoundError
Install OpenAI package:
pip install openai
Restart coder agent:
pkill -f "coder_agent/main.py"
cd a2a_communicating_agents
nohup python coder_agent/main.py > ../logs/coder.log 2>&1 &
Verify initialization:
grep "Initialized with model" logs/coder.log | tail -1
Result: Should show model name (e.g., gpt-5-mini)
Send test message and verify code generation
User request:
Run a complete diagnostic on the agent system
Complete diagnostic workflow:
Check all agents running:
echo "=== Agent Processes ==="
ps aux | grep -E "(orchestrator|coder|tester|websocket)" | grep -v grep
Check agent configs:
echo "=== Agent Configurations ==="
for agent in orchestrator_agent coder_agent tester_agent; do
if [ -f "a2a_communicating_agents/$agent/agent.json" ]; then
echo "--- $agent ---"
cat "a2a_communicating_agents/$agent/agent.json" | python -m json.tool
fi
done
Check environment:
echo "=== Environment Variables ==="
env | grep -E "(OPENAI|MODEL)" | sed 's/=.*/=***HIDDEN***/'
Check recent logs:
echo "=== Recent Log Activity ==="
tail -5 logs/*.log 2>/dev/null
Check for errors:
echo "=== Recent Errors ==="
grep -i "error\|exception" logs/*.log | tail -10
Test message sending:
echo "=== Message Transport Test ==="
python .claude/skills/agent-debug/scripts/test_message.py
Provide summary report with:
orchestrator_chat.py - Interactive chat interface for testingsend_agent_message.py - Send messages programmaticallya2a_communicating_agents/This skill provides systematic debugging for the A2A agent communication system. Use it whenever:
Follow the diagnostic steps in order, checking status → configuration → logs → transport → routing. Most issues are:
Start with the Quick Diagnostic Checklist and drill down based on what fails.
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.