Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

santosomar/test-guided-bug-detector

Name: test-guided-bug-detector
Author: santosomar

skills/debugging/test-guided-bug-detector/SKILL.md

npx skillsauth add santosomar/general-secure-coding-agent-skills test-guided-bug-detector

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Test-Guided Bug Detector

When tests fail, the failure set itself is a signal. One failure tells you where to look; the pattern across many failures tells you what kind of thing broke.

Step 1 — Triage by failure shape

| Failure pattern | Most likely cause | First move | | -------------------------------------------- | ----------------------------------------------------- | ----------------------------------------------- | | One test fails | Localized bug in the code that test covers | Read the assertion; → bug-localization | | Many tests fail with the same error | Shared dependency broke (fixture, helper, import) | Find the shared thing — not the individual tests | | Many tests fail with different errors | Environment/infra (DB down, fixture not loading) | Check setup/teardown logs, not test bodies | | All tests in one file fail | Module-level import/fixture in that file | Check the file's top-level, not the tests | | Tests fail only in CI, not locally | Env difference: version, path, timezone, locale, parallelism | Diff CI env vs local env, not the code | | Tests fail only when run together | Test pollution — one test mutates shared state | Bisect the test order; find the polluter | | Same tests intermittently fail | Flake — timing, network, randomness | Do NOT chase the code — stabilize the test |

Step 2 — Spectrum-based fault localization (when many tests fail)

The classic move: code executed by failing tests but not by passing tests is suspicious.

Run with coverage. Record which lines each test hits.
For each line, compute suspiciousness (Ochiai): fail_hits / sqrt(total_fails × (fail_hits + pass_hits))
Sort descending. The top lines are where the fault probably lives.

This is mechanical but surprisingly effective. You need ≥3 failing and ≥3 passing tests for the signal to separate from noise.

Step 3 — Cluster failures

Before debugging, group failures that share a root cause. Debugging 20 failures that are secretly 1 bug is 19× wasted effort.

Cluster by, in order:

Identical exception type + message → almost certainly same bug
Identical top-of-stack frame → very likely same bug
Same file-under-test → likely
Same fixture used → possible (if the fixture is the bug)

Pick the largest cluster. Fix it. Re-run. Repeat.

Worked example

Input: 47 tests failing after a merge.

Triage:

41 fail with KeyError: 'tenant_id' → same error → one cluster
5 fail with various assertion mismatches in test_billing.py → file-local → one cluster
1 fails with ConnectionRefused → infra → ignore for now

Cluster 1 (41 tests): All 41 use @with_authenticated_user fixture. Fixture source: creates a User dict. Grep the diff: tenant_id was added as a required field in User.__init__ but the fixture wasn't updated.

Root cause: One line in conftest.py. 41 failures → 1 bug.

Cluster 2 (5 tests): After fixing cluster 1, re-run. 3 of the 5 now pass (they were also blocked by the fixture). 2 remain. Both assert on a dollar amount that's off by exactly the tax rate. The merge also changed tax calculation.

47 → 2 root causes.

Edge cases

Every single test fails: Your test runner is broken, not your code. Import error at conftest.py/setup.py level, or the test DB didn't come up.
The failing assertion is True == True or similar tautology: The test itself is broken — pytest collected an accidentally-named non-test function, or someone committed a assert True # TODO placeholder.
New tests fail, old tests pass: The new tests might be wrong. Don't assume the product code is at fault until you've read the new test.

Do not

Do not debug failures one at a time without clustering first. You will fix the same bug five times.
Do not assume the test is right. The test is a claim; verify the claim against spec before chasing the code.
Do not re-run a flaky test until it passes and call it fixed. Mark it, quarantine it, move on — but don't lie to yourself.
Do not skip spectrum analysis because it sounds fancy. It's three lines of script and it cuts search time in half.

Output format

## Clusters
1. <N> failures — <shared root: exception/fixture/file>
   Suspected fault: <file:line>  (<how you narrowed it>)
2. ...

## Recommended order
Fix cluster <N> first (<reason: biggest / blocks others / fastest>)

## Quarantine
- <test name>: flaky, <mechanism> — do not chase

santosomar/test-guided-bug-detector

skills/debugging/test-guided-bug-detector/SKILL.md

Uses failing test results as signals to guide bug search and narrow down candidate fault locations. Use when one or more tests are failing and the user wants to understand what's broken, when CI reports failures, or when triaging a batch of test failures after a change.

testing

Updated Apr 13, 2026

$ install --global

skillsauth

npx skillsauth add santosomar/general-secure-coding-agent-skills test-guided-bug-detector

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 13, 2026, 4:17 AM45.7s1 file scanned

SKILL.md

name:: test-guided-bug-detector
description:: Uses failing test results as signals to guide bug search and narrow down candidate fault locations. Use when one or more tests are failing and the user wants to understand what's broken, when CI reports failures, or when triaging a batch of test failures after a change.
license:: Apache-2.0
category:: debugging
suite:: general-secure-coding-agent-skills
version:: 0.3.0
related:: bug-localization, regression-root-cause-analyzer

Test-Guided Bug Detector

When tests fail, the failure set itself is a signal. One failure tells you where to look; the pattern across many failures tells you what kind of thing broke.

Step 1 — Triage by failure shape

Step 2 — Spectrum-based fault localization (when many tests fail)

The classic move: code executed by failing tests but not by passing tests is suspicious.

Run with coverage. Record which lines each test hits.
For each line, compute suspiciousness (Ochiai): fail_hits / sqrt(total_fails × (fail_hits + pass_hits))
Sort descending. The top lines are where the fault probably lives.

This is mechanical but surprisingly effective. You need ≥3 failing and ≥3 passing tests for the signal to separate from noise.

Step 3 — Cluster failures

Before debugging, group failures that share a root cause. Debugging 20 failures that are secretly 1 bug is 19× wasted effort.

Cluster by, in order:

Identical exception type + message → almost certainly same bug
Identical top-of-stack frame → very likely same bug
Same file-under-test → likely
Same fixture used → possible (if the fixture is the bug)

Pick the largest cluster. Fix it. Re-run. Repeat.

Worked example

Input: 47 tests failing after a merge.

Triage:

41 fail with KeyError: 'tenant_id' → same error → one cluster
5 fail with various assertion mismatches in test_billing.py → file-local → one cluster
1 fails with ConnectionRefused → infra → ignore for now

Root cause: One line in conftest.py. 41 failures → 1 bug.

47 → 2 root causes.

Edge cases

Every single test fails: Your test runner is broken, not your code. Import error at conftest.py/setup.py level, or the test DB didn't come up.
The failing assertion is True == True or similar tautology: The test itself is broken — pytest collected an accidentally-named non-test function, or someone committed a assert True # TODO placeholder.
New tests fail, old tests pass: The new tests might be wrong. Don't assume the product code is at fault until you've read the new test.

Do not

Do not debug failures one at a time without clustering first. You will fix the same bug five times.
Do not assume the test is right. The test is a claim; verify the claim against spec before chasing the code.
Do not re-run a flaky test until it passes and call it fixed. Mark it, quarantine it, move on — but don't lie to yourself.
Do not skip spectrum analysis because it sounds fancy. It's three lines of script and it cuts search time in half.

Output format

## Clusters
1. <N> failures — <shared root: exception/fixture/file>
   Suspected fault: <file:line>  (<how you narrowed it>)
2. ...

## Recommended order
Fix cluster <N> first (<reason: biggest / blocks others / fastest>)

## Quarantine
- <test name>: flaky, <mechanism> — do not chase

Related Skills

santosomar/verified-pseudocode-extractor

development

VerifiedTrustedCommunity

Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.

SKILL.mdUpdated Apr 13, 2026

santosomar/verified-pseudocode-extractor

santosomar/tlaplus-spec-generator

development

VerifiedTrustedCommunity

Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-spec-generator

santosomar/tlaplus-model-reduction

testing

VerifiedTrustedCommunity

Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-model-reduction

santosomar/tlaplus-guided-code-repair

development

VerifiedTrustedCommunity

TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-guided-code-repair

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/santosomar/general-secure-coding-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r general-secure-coding-agent-skills/skills/debugging/test-guided-bug-detector ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

santosomar/general-secure-coding-agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT