.claude/skills/sherlock-review/SKILL.md
Evidence-based investigative code review using deductive reasoning to determine what actually happened versus what was claimed. Use when verifying implementation claims, investigating bugs, validating fixes, or conducting root cause analysis. Elementary approach to finding truth through systematic observation.
npx skillsauth add proffesor-for-testing/agentic-qe sherlock-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
<default_to_action> When investigating code claims:
The 3-Step Investigation:
# 1. OBSERVE: Gather evidence
git diff <commit>
npm test -- --coverage
# 2. DEDUCE: Compare claim vs reality
# Does code match description?
# Do tests prove the fix/feature?
# 3. CONCLUDE: Verdict with evidence
# SUPPORTED / PARTIALLY SUPPORTED / NOT SUPPORTED
Holmesian Principles:
| Category | What to Check | How |
|----------|---------------|-----|
| Claim | PR description, commit messages | Read thoroughly |
| Code | Actual file changes | git diff |
| Tests | Coverage, assertions | Run independently |
| Behavior | Runtime output | Execute locally |
| Timeline | When things happened | git log, git blame |
| Verdict | Meaning | |---------|---------| | ✓ TRUE | Evidence fully supports claim | | ⚠ PARTIALLY TRUE | Claim accurate but incomplete | | ✗ FALSE | Evidence contradicts claim | | ? NONSENSICAL | Claim doesn't apply to context |
## Sherlock Investigation: [Claim]
### The Claim
"[What PR/commit claims to do]"
### Evidence Examined
- Code changes: [files, lines]
- Tests added: [count, coverage]
- Behavior observed: [what actually happens]
### Deductive Analysis
**Claim**: [specific assertion]
**Evidence**: [what you found]
**Deduction**: [logical conclusion]
**Verdict**: ✓/⚠/✗
### Findings
- What works: [with evidence]
- What doesn't: [with evidence]
- What's missing: [gaps in implementation/testing]
### Recommendations
1. [Action based on findings]
Every investigation MUST surface at least 3 weighted observations (CRITICAL=3, HIGH=2, MEDIUM=1, LOW=0.5). Elementary observations count at INFORMATIONAL=0.25 weight. A Sherlock investigation that finds nothing is a failed investigation -- Holmes always finds clues.
Steps:
Red Flags:
Steps:
Red Flags:
Steps:
Red Flags:
catch {} swallowing errors## Case: PR #123 "Fix race condition in async handler"
### Claims Examined:
1. "Eliminates race condition"
2. "Adds mutex locking"
3. "100% thread safe"
### Evidence:
- File: src/handlers/async-handler.js
- Changes: Added `async/await`, removed callbacks
- Tests: 2 new tests for async flow
- Coverage: 85% (was 75%)
### Analysis:
**Claim 1: "Eliminates race condition"**
Evidence: Added `await` to sequential operations. No actual mutex.
Deduction: Race avoided by removing concurrency, not synchronization.
Verdict: ⚠ PARTIALLY TRUE (solved differently than claimed)
**Claim 2: "Adds mutex locking"**
Evidence: No mutex library, no lock variables, no sync primitives.
Verdict: ✗ FALSE
**Claim 3: "100% thread safe"**
Evidence: JavaScript is single-threaded. No worker threads used.
Verdict: ? NONSENSICAL (meaningless in this context)
### Conclusion:
Fix works but not for reasons claimed. Race condition avoided by
making operations sequential, not by adding synchronization.
### Recommendations:
1. Update PR description to accurately reflect solution
2. Add test for concurrent request handling
3. Remove incorrect technical claims
// Evidence-based code review
await Task("Sherlock Review", {
prNumber: 123,
claims: [
"Fixes memory leak",
"Improves performance 30%"
],
verifyReproduction: true,
testEdgeCases: true
}, "qe-code-reviewer");
// Bug fix verification
await Task("Verify Fix", {
bugCommit: 'abc123',
fixCommit: 'def456',
reproductionSteps: steps,
testBoundaryConditions: true
}, "qe-code-reviewer");
aqe/sherlock/
├── investigations/* - Investigation reports
├── evidence/* - Collected evidence
├── verdicts/* - Claim verdicts
└── patterns/* - Common deception patterns
const investigationFleet = await FleetManager.coordinate({
strategy: 'evidence-investigation',
agents: [
'qe-code-reviewer', // Code analysis
'qe-security-auditor', // Security claim verification
'qe-performance-validator' // Performance claim verification
],
topology: 'parallel'
});
"It is a capital mistake to theorize before one has data." Trust only reproducible evidence. Don't trust commit messages, documentation, or "works on my machine."
The Sherlock Standard: Every claim must be verified empirically. What does the evidence actually show?
development
Apply XP practices including pair programming, ensemble programming, continuous integration, and sustainable pace. Use when implementing agile development practices, improving team collaboration, or adopting technical excellence practices.
development
Warehouse Management System testing patterns for inventory operations, pick/pack/ship workflows, wave management, EDI X12/EDIFACT compliance, RF/barcode scanning, and WMS-ERP integration. Use when testing WMS platforms (Blue Yonder, Manhattan, SAP EWM).
testing
Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.
development
Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.