Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

sawrus/troubleshooting

Name: troubleshooting
Author: sawrus

areas/software/backend/skills/troubleshooting/SKILL.md

npx skillsauth add sawrus/agent-guides troubleshooting

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Troubleshooting Skill

Expertise: Systematic debugging, log analysis, query profiling, memory/CPU profiling, regression tests.

Debugging Framework (RRCA)

1. REPRODUCE — make the bug happen reliably before touching code
2. REDUCE    — find the smallest input that triggers the bug
3. CAUSE     — identify the specific code line/condition responsible
4. ADDRESS   — fix + regression test + verify fix doesn't reappear

Never fix what you can't reproduce. A guess-and-check fix is technical debt.

Log Analysis Patterns

# Find all errors in last hour (structured logs with jq)
journalctl -u myapp --since "1 hour ago" | jq 'select(.level == "error")'

# Count errors by type
cat app.log | jq -r '.error_code' | sort | uniq -c | sort -rn | head -20

# Find slowest requests
cat access.log | jq 'select(.duration > 1000)' | jq -r '[.method, .path, .duration] | @csv'

# Trace a specific request by request_id
grep "request_id=req_abc123" app.log

# Find N+1 patterns: same query repeated many times in same request
grep "request_id=req_abc123" app.log | grep "db.query" | wc -l  # > 10 is suspicious

Database Query Debugging

-- Show currently running queries (PostgreSQL)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle' AND query_start < now() - interval '5 seconds'
ORDER BY duration DESC;

-- Kill a blocking query
SELECT pg_terminate_backend(<pid>);

-- Find slow queries from pg_stat_statements
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 10;

-- Check for table bloat (after many deletes/updates)
SELECT relname, n_dead_tup, n_live_tup,
       round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;

Memory Leak Detection (Python)

# Detect growing memory with tracemalloc
import tracemalloc

tracemalloc.start()

# ... run suspected code ...

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
for stat in top_stats[:10]:
    print(stat)  # shows file:line and allocated bytes

# Typical culprits:
# - Unbounded in-memory caches (dict that grows forever)
# - Event listeners not being removed
# - Circular references preventing GC

HTTP Client Debugging

import httpx
import logging

# Enable full request/response logging for debugging
logging.basicConfig(level=logging.DEBUG)
httpx_logger = logging.getLogger("httpx")
httpx_logger.setLevel(logging.DEBUG)

# Or manually log request details
async with httpx.AsyncClient() as client:
    response = await client.post(url, json=payload, timeout=30.0)
    print(f"Request: {response.request.method} {response.request.url}")
    print(f"Request body: {response.request.content}")
    print(f"Status: {response.status_code}")
    print(f"Response: {response.text[:500]}")

Writing Regression Tests

# Pattern: test must FAIL before the fix, PASS after
# Step 1: reproduce as a test
def test_order_creation_with_zero_quantity_raises_validation_error():
    """Regression: order with zero quantity was silently accepted (bug PROJ-198)"""
    with pytest.raises(ValidationError) as exc_info:
        CreateOrderRequest(items=[{"product_id": "prod_1", "quantity": 0}])

    assert "quantity" in str(exc_info.value)
    assert "greater than 0" in str(exc_info.value)

# Step 2: run test — it should FAIL on current code
# Step 3: implement fix
# Step 4: run test — it should now PASS
# Step 5: run full suite to confirm no regressions

Common Root Cause Patterns

| Symptom | Likely cause | Investigation | |---|---|---| | Intermittent 500s after deploy | Exception in new code path | Check error logs at deploy time; git diff | | Slow requests on specific endpoint | N+1 query or missing index | EXPLAIN ANALYZE on DB queries in that request | | Memory grows over time | Cache without TTL or eviction | tracemalloc; check dict/list sizes in long-running processes | | Random 429s from upstream | No retry-after respect; burst traffic | Check outbound request rate; add backoff | | Test passes locally, fails in CI | Env var missing; timing issue | Compare env vars; add explicit waits | | Occasional duplicate records | Missing idempotency key or DB unique constraint | Add unique constraint; add idempotency check |

sawrus/troubleshooting

areas/software/backend/skills/troubleshooting/SKILL.md

Systematic backend debugging — reproduce, isolate root cause, implement fix with regression test.

12 stars

development

Updated Apr 18, 2026

$ install --global

skillsauth

npx skillsauth add sawrus/agent-guides troubleshooting

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 18, 2026, 4:25 AM40.5s1 file scanned

SKILL.md

name:: troubleshooting
type:: skill
description:: Systematic backend debugging — reproduce, isolate root cause, implement fix with regression test.
allowed-tools:: Read, Write, Edit, Bash, Grep

Troubleshooting Skill

Expertise: Systematic debugging, log analysis, query profiling, memory/CPU profiling, regression tests.

Debugging Framework (RRCA)

1. REPRODUCE — make the bug happen reliably before touching code
2. REDUCE    — find the smallest input that triggers the bug
3. CAUSE     — identify the specific code line/condition responsible
4. ADDRESS   — fix + regression test + verify fix doesn't reappear

Never fix what you can't reproduce. A guess-and-check fix is technical debt.

Log Analysis Patterns

# Find all errors in last hour (structured logs with jq)
journalctl -u myapp --since "1 hour ago" | jq 'select(.level == "error")'

# Count errors by type
cat app.log | jq -r '.error_code' | sort | uniq -c | sort -rn | head -20

# Find slowest requests
cat access.log | jq 'select(.duration > 1000)' | jq -r '[.method, .path, .duration] | @csv'

# Trace a specific request by request_id
grep "request_id=req_abc123" app.log

# Find N+1 patterns: same query repeated many times in same request
grep "request_id=req_abc123" app.log | grep "db.query" | wc -l  # > 10 is suspicious

Database Query Debugging

-- Show currently running queries (PostgreSQL)
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle' AND query_start < now() - interval '5 seconds'
ORDER BY duration DESC;

-- Kill a blocking query
SELECT pg_terminate_backend(<pid>);

-- Find slow queries from pg_stat_statements
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 10;

-- Check for table bloat (after many deletes/updates)
SELECT relname, n_dead_tup, n_live_tup,
       round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;

Memory Leak Detection (Python)

# Detect growing memory with tracemalloc
import tracemalloc

tracemalloc.start()

# ... run suspected code ...

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
for stat in top_stats[:10]:
    print(stat)  # shows file:line and allocated bytes

# Typical culprits:
# - Unbounded in-memory caches (dict that grows forever)
# - Event listeners not being removed
# - Circular references preventing GC

HTTP Client Debugging

import httpx
import logging

# Enable full request/response logging for debugging
logging.basicConfig(level=logging.DEBUG)
httpx_logger = logging.getLogger("httpx")
httpx_logger.setLevel(logging.DEBUG)

# Or manually log request details
async with httpx.AsyncClient() as client:
    response = await client.post(url, json=payload, timeout=30.0)
    print(f"Request: {response.request.method} {response.request.url}")
    print(f"Request body: {response.request.content}")
    print(f"Status: {response.status_code}")
    print(f"Response: {response.text[:500]}")

Writing Regression Tests

# Pattern: test must FAIL before the fix, PASS after
# Step 1: reproduce as a test
def test_order_creation_with_zero_quantity_raises_validation_error():
    """Regression: order with zero quantity was silently accepted (bug PROJ-198)"""
    with pytest.raises(ValidationError) as exc_info:
        CreateOrderRequest(items=[{"product_id": "prod_1", "quantity": 0}])

    assert "quantity" in str(exc_info.value)
    assert "greater than 0" in str(exc_info.value)

# Step 2: run test — it should FAIL on current code
# Step 3: implement fix
# Step 4: run test — it should now PASS
# Step 5: run full suite to confirm no regressions

Common Root Cause Patterns

Related Skills

sawrus/qa_expert

testing

VerifiedTrustedCommunity

QA Expert for writing E2E tests, test scenarios, test plans, and ensuring test coverage quality.

12SKILL.mdUpdated Apr 18, 2026

sawrus/design_expert

development

VerifiedTrustedCommunity

Expert UI/UX design intelligence for creating distinctive, high-craft, and mobile-first interfaces. Focuses on premium aesthetics, touch-first ergonomics, and Flutter performance.

12SKILL.mdUpdated Apr 18, 2026

sawrus/code_review_expert

development

VerifiedTrustedCommunity

Code Review Expert for static analysis, security auditing, architecture review, and ensuring code quality standards.

12SKILL.mdUpdated Apr 18, 2026

sawrus/code_review_expert

sawrus/babysit-pr

development

VerifiedTrustedCommunity

Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.

12SKILL.mdUpdated Apr 18, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/sawrus/agent-guides.git

# Copy into Claude Code skills folder (global)
cp -r agent-guides/areas/software/backend/skills/troubleshooting ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

sawrus/agent-guides

12 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT