skills/debugging/root-cause-tracing/SKILL.md
Use when symptoms don't reveal the cause. Trace backward through call chains to find where problems originate. Follow: Observe symptom → Find immediate cause → Identify caller → Keep tracing → Locate trigger.
npx skillsauth add liauw-media/codeassist root-cause-tracingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Symptoms appear downstream. Root causes live upstream. Trace backward through the call chain until you find the original trigger.
NEVER STOP AT THE SYMPTOM. Trace backward until you find the ORIGINAL TRIGGER.
Fixing symptoms is temporary. Fixing root causes is permanent.
Benefits: ✅ Finds actual cause, not just symptoms ✅ Prevents problem from recurring ✅ Reveals systemic issues ✅ Builds system understanding ✅ Fixes multiple symptoms at once
Without root cause tracing: ❌ Fix one symptom, three more appear ❌ Same bug keeps coming back ❌ Waste time on wrong solutions ❌ Never understand the real problem ❌ Accumulate technical debt
🔍 OBSERVE Phase
Symptom: User profile page shows wrong user data
Specific observation:
- User A logs in
- Views profile page (/profile)
- Sees User B's name and email
- But sees own profile picture
Initial symptom recorded ✅
What to capture:
🎯 IMMEDIATE CAUSE Phase
Symptom: Profile page shows wrong user data
Where does profile data come from?
Checking ProfileController:
```php
public function show()
{
$user = User::find(1); // ⚠️ HARDCODED ID!
return view('profile', ['user' => $user]);
}
Immediate cause found: Hardcoded user ID (1) But this isn't the root cause - WHY is it hardcoded?
**Investigation techniques:**
- Check the code where symptom appears
- Add logging to see data flow
- Use debugger to inspect state
- Check what calls this code
### Step 3: Identify the Caller
📞 CALLER Phase
Immediate cause: Hardcoded user ID in ProfileController
Who calls this controller? Trace backward:
Route: /profile → ProfileController@show
Who defined this route? routes/web.php:
Route::get('/profile', [ProfileController::class, 'show']);
Wait - no authentication middleware! Route is missing ->middleware('auth')
But this still might not be the root cause. When was this route added? Check git history:
git log -p routes/web.php
Found: Added in commit abc123 "Quick fix for profile page" Commit message says "temporary fix"
Tracing deeper...
**Tracing techniques:**
- Check call stack
- Search codebase for callers
- Review git history
- Check when/why code was added
- Look for comments like "TODO" or "FIXME"
### Step 4: Keep Tracing Upstream
⬆️ UPSTREAM TRACING Phase
Current understanding:
Why was quick fix needed? Check related commits:
Previous commit: "Refactor authentication system"
ROOT CAUSE FOUND: During authentication refactor, /profile route was accidentally left without middleware. Developer added "quick fix" to make it work temporarily but hardcoded user ID instead of fixing properly.
Original trigger: Incomplete refactoring
**Keep asking:**
- Why does this code exist?
- What was the original requirement?
- When was this pattern established?
- Who made this decision and why?
- What changed to expose this issue?
### Step 5: Verify the Root Cause
✅ VERIFY ROOT CAUSE Phase
Hypothesized root cause: Incomplete refactor left route without auth middleware
Verification:
Check if auth middleware works on other routes Result: ✅ Yes, /dashboard and /settings work correctly
Check if adding auth middleware fixes the issue
Route::get('/profile', [ProfileController::class, 'show'])
->middleware('auth');
And remove hardcoded ID:
public function show()
{
$user = Auth::user();
return view('profile', ['user' => $user]);
}
Test the fix Result: ✅ Profile page now shows correct user data
Check for other routes with same issue Result: Found 2 more routes also missing middleware
Root cause verified ✅ Systematic fix: Add middleware to all user-specific routes
## Advanced Tracing: Test Pollution
### The find-polluter.sh Pattern
```bash
# When one test affects another (test pollution)
🔍 Problem: TestUserLogin passes alone, fails in suite
Symptom: Test expects clean database, finds existing user
Backward trace:
1. Which test leaves data behind?
2. Use binary search to find polluter
Script concept:
```bash
#!/bin/bash
# find-polluter.sh
FAILING_TEST="TestUserLogin"
ALL_TESTS=($(./find-all-tests.sh))
test_passes_with_subset() {
tests=$1
run_tests "$tests" && run_test "$FAILING_TEST"
}
# Binary search
low=0
high=${#ALL_TESTS[@]}
while [ $low -lt $high ]; do
mid=$(( (low + high) / 2 ))
subset="${ALL_TESTS[@]:0:$mid}"
if test_passes_with_subset "$subset"; then
low=$((mid + 1))
else
high=$mid
fi
done
echo "Polluter found: ${ALL_TESTS[$low]}"
Result: TestUserRegistration doesn't clean up test data
Root cause: Missing database rollback in tearDown()
Fix:
protected function tearDown(): void
{
DB::rollback();
parent::tearDown();
}
## Real-World Root Cause Tracing Examples
### Example 1: Performance Degradation
Symptom: Dashboard loads in 15 seconds (was 2 seconds)
Immediate: Database query takes 14 seconds
SELECT * FROM orders WHERE user_id = 123;
-- Takes 14 seconds
Trace backward:
But keep tracing...
ROOT CAUSE: Missing process for migration reviews
FIXES:
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Now takes 0.1 seconds ✅
### Example 2: Data Corruption
Symptom: User balance shows negative value (-$50)
Immediate: Balance calculated incorrectly
$balance = $income - $expenses; // Results in -50
Trace backward:
Keep tracing:
ROOT CAUSE: Misunderstood payment API behavior + missing safeguards
FIXES:
// Add idempotency
public function processPayment($paymentId, $idempotencyKey)
{
if (Transaction::where('idempotency_key', $idempotencyKey)->exists()) {
return ['status' => 'already_processed'];
}
// Process payment...
Transaction::create([
'payment_id' => $paymentId,
'idempotency_key' => $idempotencyKey,
// ...
]);
}
### Example 3: Intermittent Test Failures
Symptom: Test fails randomly (1 in 20 runs)
Immediate: Assertion fails on expected value
public function test_order_total_calculation()
{
$order = Order::factory()->create();
$order->addItem(['price' => 10.00, 'qty' => 2]);
$this->assertEquals(20.00, $order->total());
// Sometimes fails: Expected 20.00, got 0.00
}
Trace backward:
public function addItem($item)
{
// Async save?
dispatch(new SaveOrderItemJob($this->id, $item));
}
public function total()
{
return $this->items->sum('price');
// Might run before job completes!
}
Keep tracing:
ROOT CAUSE: Premature optimization introduced race condition
FIXES:
public function addItem($item)
{
if (app()->environment('testing')) {
// Synchronous in tests
$this->items()->create($item);
} else {
// Async in production
dispatch(new SaveOrderItemJob($this->id, $item));
}
}
## Root Cause Tracing Patterns
### Pattern 1: The Five Whys
Problem: User logout fails
Why? → Token not invalidated Why? → Logout method doesn't call revokeTokens() Why? → Developer didn't know about revokeTokens() Why? → No documentation on authentication system Why? → No process for documenting architectural decisions
Root cause: Missing architectural decision records (ADRs)
Fix: Implement ADR process for all major decisions
### Pattern 2: The Timeline Analysis
Problem: Search feature broken
Timeline:
What changed between Jan 15-30?
git log --since="Jan 15" --until="Jan 30" --oneline
Found:
Check breaking changes in Elasticsearch v8:
Root cause: Breaking changes in dependency upgrade
Fix: Update query syntax for v8 compatibility
### Pattern 3: The Dependency Chain
Problem: Email sending fails
Dependency chain: EmailController → EmailService → QueueManager → RedisConnection → Redis Server
Trace backward:
Why Redis timeout?
Root cause: Incomplete upgrade process
Fix: Add .env.example updates to upgrade checklist
## Tracing Tools and Techniques
### Tool 1: Git Bisect (Find Breaking Commit)
```bash
# Feature worked last week, broken now
git bisect start
git bisect bad HEAD # Current (broken)
git bisect good v1.2.0 # Last known good
# Git checks out middle commit
# Test if bug exists
./run-test.sh
git bisect bad # if broken
# or
git bisect good # if works
# Repeat until found
# Git identifies exact breaking commit
git bisect reset
# Now trace backward from that commit
// Add to debug code
Log::debug('Call stack', [
'trace' => debug_backtrace(DEBUG_BACKTRACE_IGNORE_ARGS)
]);
// Shows exact call chain:
// ProfileController@show
// ← Route::dispatch
// ← Kernel@handle
// ← index.php
// Reveals: ProfileController called without auth middleware
// Enable query log
DB::enableQueryLog();
// Run problematic code
$user->orders()->get();
// Check queries
$queries = DB::getQueryLog();
Log::debug('Queries executed', $queries);
// Reveals:
// SELECT * FROM orders (no WHERE clause!)
// Missing user_id filter
// Trace backward to find why WHERE clause missing
# Test passes alone, fails in suite
# Run first half of tests + failing test
./run-tests.sh "tests/Unit/Test*.php tests/Feature/FailingTest.php"
# Passes? Polluter in second half
# Fails? Polluter in first half
# Repeat, narrowing down to single polluter
Use with:
systematic-debugging - After isolating problem locationtest-driven-development - Write test that exposes root causecode-review - Review fixes for root causes, not symptomsgit-workflow - Use git history to trace originsLeads to:
writing-plans - Plan systematic fix for root causeexecuting-plans - Implement comprehensive solutionverification-before-completion - Verify root cause fixed❌ BAD:
Symptom: Query slow
Fix: Add LIMIT 100 to query
✅ GOOD:
Symptom: Query slow
Trace: Why slow? → Missing index
Trace: Why missing index? → Not in migration
Trace: Why not in migration? → No review checklist
Fix: Add index + create migration review checklist
❌ BAD:
Problem: User sees 500 error
Fix: Add try/catch to suppress error
✅ GOOD:
Problem: User sees 500 error
Trace: What causes error? → Null pointer
Trace: Why null? → Database query returns nothing
Trace: Why no results? → Wrong table name in query
Trace: Why wrong table? → Copy/paste error
Trace: How to prevent? → Add test coverage
Fix: Correct table name + add test
❌ BAD:
"This fails because of X" → Fix X
✅ GOOD:
"This fails because of X"
- Why X?
- What caused X?
- How did X get into this state?
- When was X introduced?
- Why didn't we catch X sooner?
→ Fix root cause of X
For each bug:
This skill is based on:
Research: Studies show fixing root causes prevents 5-10 related bugs from occurring.
Social Proof: All mature engineering organizations require root cause analysis for critical bugs.
When investigating bugs:
Bottom Line: Symptoms lie downstream. Root causes live upstream. Trace backward until you find the original trigger. Fix the cause, not the symptom, or the bug will return.
development
Use when decomposing complex work. Dispatch fresh subagent per task, review between tasks. Flow: Load plan → Dispatch task → Review output → Apply feedback → Mark complete → Next task. No skipping reviews, no parallel dispatch.
development
# Server Documentation System Set up a documentation system that tracks changes and maintains server/project documentation with Claude Code hooks. ## When to Use - Setting up a new server or development environment - Need to track configuration changes over time - Want automatic documentation of work sessions - Maintaining changelog for infrastructure ## Directory Structure ``` ~/docs/ # User home directory (cross-platform) ├── changelog.md # Global over
development
Delegate tasks to remote Claude Code agent containers for parallel execution, long-running analysis, or resource-intensive operations.
development
Use when working on multiple features simultaneously. Creates isolated workspaces without branch switching, enabling parallel development.