.claude/skills/multi-ai-debugging/SKILL.md
Systematic debugging using Claude, Gemini, and Codex as specialized agents. Multi-agent root cause analysis, log analysis, error classification, and auto-fix generation. Use when debugging production issues, analyzing error logs, performing root cause analysis, troubleshooting complex systems, or implementing self-healing patterns.
npx skillsauth add adaptationio/skrillz multi-ai-debuggingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
multi-ai-debugging provides systematic debugging workflows using multiple AI models as specialized agents. Based on 2024-2025 best practices for AI-assisted debugging with multi-agent architectures.
Purpose: Systematic root cause analysis and fix generation using AI ensemble
Pattern: Task-based (6 independent debugging operations)
Key Principles (validated by tri-AI research):
Quality Targets:
Use multi-ai-debugging when:
When NOT to Use:
Time: 2-5 minutes Automation: 70% Purpose: Fast initial diagnosis for common errors
Process:
Diagnose this error:
Error: [PASTE ERROR MESSAGE]
Stack trace: [PASTE STACK TRACE]
Provide:
1. What type of error is this?
2. Most likely root cause (1-2 sentences)
3. Immediate fix suggestion
4. Prevention recommendation
gemini -p "Search for solutions to this error:
[ERROR MESSAGE]
Find:
- Common causes
- Stack Overflow solutions
- GitHub issues with fixes"
Time: 15-45 minutes Automation: 50% Purpose: Deep root cause analysis for complex issues
Process:
Step 1: Gather Context (Context Agent)
# Recent changes
git log --oneline -20
git diff HEAD~5..HEAD --stat
# Related logs
grep -r "ERROR\|Exception\|WARN" logs/ | tail -100
# System state
# Check for relevant metrics, traces, etc.
Step 2: Hypothesis Generation (Analysis Agent)
Perform root cause analysis:
**Error/Symptom**: [DESCRIPTION]
**Context**:
- Recent changes: [GIT LOG]
- Logs: [RELEVANT LOG ENTRIES]
- System state: [METRICS/OBSERVATIONS]
- When started: [TIMESTAMP]
- Affected scope: [USERS/SERVICES]
**Tasks**:
1. List 3-5 probable root causes ranked by likelihood
2. For each hypothesis:
- Evidence supporting it
- Evidence against it
- Confidence level (High/Medium/Low)
3. Recommend investigation steps for top hypothesis
Step 3: Cross-Validate (Verification Agent)
Verify this root cause hypothesis:
Hypothesis: [TOP HYPOTHESIS]
Evidence: [SUPPORTING EVIDENCE]
Tasks:
1. What would we expect to see if this is correct?
2. What would disprove this hypothesis?
3. Design a reproduction test
4. Confidence assessment (0-100)
Step 4: Generate RCA Report
## Root Cause Analysis Report
**Incident**: [DESCRIPTION]
**Date**: [DATE]
**Duration**: [DURATION]
**Impact**: [USERS/SERVICES AFFECTED]
### Timeline
- HH:MM - First error observed
- HH:MM - Investigation began
- HH:MM - Root cause identified
- HH:MM - Fix deployed
### Root Cause
[DETAILED EXPLANATION]
### Contributing Factors
1. [FACTOR 1]
2. [FACTOR 2]
### Resolution
[FIX APPLIED]
### Prevention
1. [ACTION ITEM 1]
2. [ACTION ITEM 2]
Time: 5-15 minutes Automation: 80% Purpose: Analyze and classify error logs
Process:
Analyze these log entries:
[PASTE 50-100 LOG LINES]
Tasks:
1. Identify unique error patterns (cluster similar errors)
2. Classify each pattern:
- Type: (Bug/Config/Network/Resource/Security/User Error)
- Severity: (Critical/High/Medium/Low)
- Impact: (Data Loss/Service Down/Degraded/Minor)
3. Count occurrences per pattern
4. Identify the root pattern (original error vs cascading)
5. Recommend priority order for investigation
Perform semantic analysis on these logs:
[LOG ENTRIES]
Looking for:
- Anomalies in timing/sequence
- Correlation between events
- Hidden dependencies
- Patterns human might miss
Time: 20-60 minutes Automation: 40% Purpose: Complex issues requiring multiple perspectives
Process:
Launch Parallel Agents:
Launch 4 debugging agents for this issue:
Issue: [DESCRIPTION]
Code: [RELEVANT CODE]
Logs: [RELEVANT LOGS]
Agent 1 (Code Reviewer):
"Analyze the code for bugs. Focus on:
- Logic errors
- Edge cases
- Race conditions
- Resource leaks"
Agent 2 (Log Analyzer):
"Analyze the logs for clues. Focus on:
- Error sequences
- Timing patterns
- State changes
- External dependencies"
Agent 3 (System Analyst):
"Analyze system context. Focus on:
- Resource constraints
- Configuration issues
- Dependency problems
- Infrastructure state"
Agent 4 (Historical Analyst):
"Analyze history. Focus on:
- Recent changes that could cause this
- Similar past incidents
- Regression indicators
- Pattern matching to known issues"
Council Deliberation:
Synthesize findings from all debugging agents:
Agent 1 (Code): [FINDINGS]
Agent 2 (Logs): [FINDINGS]
Agent 3 (System): [FINDINGS]
Agent 4 (History): [FINDINGS]
Tasks:
1. Find consensus root cause (where 2+ agents agree)
2. Resolve conflicting hypotheses
3. Combine evidence for strongest theory
4. Rate overall confidence (0-100)
5. Propose fix with verification steps
Time: 10-30 minutes Automation: 60% Purpose: Generate and verify fixes
Process:
Step 1: Generate Fix (Fixer Agent)
Generate a fix for this issue:
Issue: [ROOT CAUSE]
Code: [AFFECTED CODE]
Requirements:
1. Minimal change (fix only the issue)
2. Include error handling
3. Add comments explaining the fix
4. Suggest test cases to verify
Output format:
- File: [path]
- Before: [original code]
- After: [fixed code]
- Explanation: [why this fixes it]
Step 2: Critique Fix (Critic Agent)
Critique this proposed fix:
Issue: [ORIGINAL ISSUE]
Proposed Fix: [FIX CODE]
Evaluate:
1. Does it actually fix the root cause?
2. Could it introduce new bugs?
3. Edge cases not handled?
4. Performance implications?
5. Security implications?
Verdict: APPROVE / NEEDS_REVISION / REJECT
Step 3: Generate Regression Test
Generate a regression test for this fix:
Original Bug: [DESCRIPTION]
Fix Applied: [FIX CODE]
Create test that:
1. Would have caught the original bug
2. Verifies the fix works
3. Tests edge cases
4. Can run in CI/CD
Time: Variable Automation: 90% Purpose: Automated detection and remediation
Process:
Define Remediation Playbooks:
# Example: Auto-remediation patterns
PLAYBOOKS = {
"disk_space_low": {
"detection": "disk_usage > 90%",
"actions": [
"compress_old_logs",
"clear_temp_files",
"alert_if_still_high"
]
},
"memory_leak_detected": {
"detection": "memory_growth > 10%/hour",
"actions": [
"capture_heap_dump",
"graceful_restart",
"alert_team"
]
},
"error_rate_spike": {
"detection": "error_rate > 5%",
"actions": [
"check_recent_deploys",
"consider_rollback",
"alert_on_call"
]
}
}
Configure Circuit Breakers:
# Intelligent circuit breaking
class AICircuitBreaker:
def should_open(self, metrics):
"""AI predicts cascading failure risk."""
prompt = f"""
Given these metrics:
- Error rate: {metrics['error_rate']}
- Latency p99: {metrics['latency_p99']}
- Dependencies health: {metrics['deps']}
Should we open the circuit breaker?
Risk of cascade: (Low/Medium/High)
Recommendation: (OPEN/CLOSED/HALF_OPEN)
"""
return analyze(prompt)
| Task | Primary | Verification | Strength | |------|---------|--------------|----------| | Log analysis | Gemini | Claude | Fast, large context | | Code analysis | Claude | Codex | Deep understanding | | Root cause | Claude | Gemini | Reasoning + search | | Fix generation | Claude | Codex | Code + review | | Research | Gemini | Claude | Web search |
Gemini for Log Search:
gemini -p "Analyze these logs and identify anomalies:
[LOGS]"
Claude for Root Cause:
Given this debugging context, what's the root cause?
[CONTEXT]
Codex for Fix Validation:
codex "Review this fix for correctness and edge cases:
[FIX]"
Error Type Decision Tree:
1. Is there a stack trace?
├── Yes → Go to Code Error Analysis
└── No → Go to System Error Analysis
2. Code Error Analysis:
├── NullPointer/TypeError → Missing null check
├── IndexOutOfBounds → Boundary condition
├── Timeout → Resource/network issue
├── Permission denied → Auth/authz issue
└── Unknown → Multi-agent analysis
3. System Error Analysis:
├── Connection refused → Service down
├── Disk full → Resource exhaustion
├── Out of memory → Memory leak/sizing
├── CPU spike → Performance issue
└── Unknown → Multi-agent analysis
Severity Decision:
CRITICAL (P1):
- Data loss occurring
- Security breach active
- Service completely down
- Revenue impact immediate
HIGH (P2):
- Service degraded significantly
- Errors affecting >10% users
- Potential data integrity issues
MEDIUM (P3):
- Errors affecting <10% users
- Workaround available
- Non-critical feature broken
LOW (P4):
- Cosmetic issues
- Edge case errors
- No user impact
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
@tracer.start_as_current_span("debug_session")
def debug_with_tracing(error):
"""Debug with full trace context."""
span = trace.get_current_span()
# Add debugging context to trace
span.set_attribute("error.type", type(error).__name__)
span.set_attribute("error.message", str(error))
# Gather trace context for AI
context = {
"trace_id": span.get_span_context().trace_id,
"parent_spans": get_parent_spans(),
"related_logs": get_correlated_logs()
}
return multi_agent_debug(context)
def correlate_logs_for_debugging(trace_id, time_window="5m"):
"""Gather all logs related to an incident."""
# Query logs with trace correlation
logs = query_logs(f"""
trace_id:{trace_id} OR
(level:ERROR AND timestamp:[now-{time_window} TO now])
""")
# Feed to AI for analysis
return analyze_logs_with_ai(logs)
User: My API is returning 500 errors intermittently
Claude: Let me run multi-agent debugging to diagnose this.
[Launches Context, Analysis, and Verification agents]
## Debugging Report: Intermittent 500 Errors
### Agent Findings
**Log Analyzer**:
- Pattern: Errors spike every 5 minutes
- Correlation: Occurs during background job runs
- Log cluster: "Connection pool exhausted"
**Code Reviewer**:
- Found: Database connections not returned to pool in /api/users.py:145
- Missing: try/finally block for connection cleanup
**System Analyst**:
- Connection pool max: 20
- Concurrent requests during spike: 25+
- Background job opens 10 connections
**Historical**:
- Started after commit abc123 (2 days ago)
- Commit added new background job without connection limit
### Consensus Root Cause (Confidence: 92%)
Connection pool exhaustion due to:
1. New background job consuming connections
2. Missing connection cleanup in API handler
### Proposed Fix
```python
# /api/users.py:145
# Before
def get_user(id):
conn = pool.get_connection()
result = conn.execute("SELECT * FROM users WHERE id = ?", id)
return result # Connection never returned!
# After
def get_user(id):
conn = pool.get_connection()
try:
result = conn.execute("SELECT * FROM users WHERE id = ?", id)
return result
finally:
pool.return_connection(conn) # Always return
---
## Related Skills
- **multi-ai-code-review**: Review fixes before deployment
- **multi-ai-verification**: Verify fix effectiveness
- **multi-ai-testing**: Generate regression tests
- **ecs-troubleshooting**: Container-specific debugging
- **railway-troubleshooting**: Railway platform debugging
---
## References
- `references/log-analysis-patterns.md` - Log analysis techniques
- `references/self-healing-playbooks.md` - Auto-remediation patterns
development
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.