skills/debugging/szz-bug-identifier/SKILL.md
Applies the SZZ algorithm to VCS history to identify which commits introduced bugs by correlating bug-fix commits with earlier changes. Use when mining a repository for bug-introducing commits, when building a defect-prediction dataset, or when the user asks which commit introduced a given fixed bug.
npx skillsauth add santosomar/general-secure-coding-agent-skills szz-bug-identifierInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
SZZ (Śliwerski, Zimmermann, Zeller, 2005) answers: given a bug-fix commit, which earlier commit introduced the bug? It works by blaming the lines the fix touched.
Fixes #1234, BUG-567) or a message keyword (fix, bug, patch).git blame <fix>^ -- <file> on each modified line to find the commit that last touched it before the fix.That's the whole algorithm. The rest is noise filtering.
| Noise source | Why it's wrong | Filter |
| --------------------------------- | -------------------------------------------------------- | ------------------------------------------------------ |
| Whitespace / formatting changes | Blame hits a prettier run, not the real introducer | git blame -w; ignore commits that only touch formatting |
| Comment-only changes | The fix edited a comment too — that line is not the bug | Strip comment lines before blaming |
| Large refactor commits | Every line blames to the Great Refactor of 2019 | git blame --ignore-rev with a curated ignore-list |
| The line was added by the fix | No blame target — added lines didn't exist before | Only blame deleted/modified lines, not added |
| Bug predates the repo | Blame hits the initial import commit | Flag — can't attribute |
| Moved file | Blame stops at the git mv | git blame -C -M to follow moves/copies |
| Blamed commit is newer than bug report | The bug existed before that commit; blame is wrong | Discard candidates with commit-date > bug-report-date |
Fix commit: c4a9f1b — Fix: null check in getUserEmail (closes #892)
public String getUserEmail(long id) {
User u = repo.find(id);
- return u.getEmail();
+ if (u == null) return null;
+ return u.getEmail();
}
Step 2: Modified line is return u.getEmail(); (the old version).
Step 3: git blame c4a9f1b^ -- UserService.java at that line → a17d3e0 — Add UserService (Jane, 2021-03-04).
Step 4: Candidate = a17d3e0.
Filters:
a17d3e0 is 2021. ✓Verdict: a17d3e0 introduced the bug. The null-check was never there.
semantic-szz-analyzer to distinguish.git revert — the reverted commit IS the bug-introducer, definitionally. Shortcut: check if the fix is a revert before running SZZ.(cherry picked from commit …) trailers and follow the chain..git-blame-ignore-revs. Without it, every result blames the last formatting pass.fix, bug) as your only fix-identification signal. False-positive rate is brutal. Prefer issue-tracker links.regression-root-cause-analyzer is for (bisect is more precise). SZZ is for batch mining.fix: <sha> — <subject>
candidates:
<sha> — <subject> (<date>, <author>)
blamed from: <file>:<line>
filters passed: whitespace ✓ comment ✓ date ✓
confidence: <high|medium|low>
...
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.