Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

santosomar/code-pattern-extractor

Name: code-pattern-extractor
Author: santosomar

skills/code-analysis/code-pattern-extractor/SKILL.md

npx skillsauth add santosomar/general-secure-coding-agent-skills code-pattern-extractor

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Code Pattern Extractor

Every codebase has patterns — some intentional (house idioms), some accidental (copy-paste). Finding them tells you how code gets written here, and which duplication should be consolidated.

Three pattern types

| Type | What it is | What to do with it | | ----------------- | ------------------------------------------------ | ----------------------------------------- | | Idiom | The team's standard way of doing X | Document it. New code should follow it. | | Clone | Copy-pasted code with minor tweaks | Extract to a function. → code-refactoring-assistant | | Anti-pattern | A recurring mistake | Flag it. → code-smell-detector |

The same structural pattern can be any of the three — it depends on whether the repetition is good, accidental, or bad.

Finding clones — structural similarity

Exact-text clones are easy (rg + sort + uniq -c). Near-clones — same structure, different variable names — need normalization:

Tokenize each function.
Normalize: replace identifiers with placeholders ($1, $2, ...), normalize literals (42 → NUM, "foo" → STR).
Hash the normalized token stream.
Group by hash. Collisions are clone candidates.

Original A:   user = db.get(user_id);  if user is None: raise NotFound("user")
Original B:   order = db.get(order_id); if order is None: raise NotFound("order")

Normalized:   $1 = db.get($2);         if $1 is None: raise NotFound(STR)
              $1 = db.get($2);         if $1 is None: raise NotFound(STR)

→ Same hash. Clone pair found.

Extractable as: def get_or_404(model, id, name): ...

Finding idioms — frequency of small patterns

Idioms are short (2–5 lines) and appear everywhere. Mine them by n-gram frequency on normalized AST nodes:

| Normalized n-gram | Count | Interpretation | | --------------------------------------------- | ----- | ---------------------------------------- | | if $1 is None: return None | 89 | Null-propagation idiom — this codebase uses it heavily | | with self._lock: $BODY | 34 | Locking idiom — _lock is the house convention | | logger.info(f"{$1.__class__.__name__}: ...")| 27 | Logging idiom — always includes class name | | try: $X except Exception: pass | 11 | Anti-pattern — swallowing all exceptions |

The count tells you: high-count idioms are conventions to follow; high-count anti-patterns are systemic problems.

Worked example — extracting a house idiom

Observed in 23 places:

def get_foo(self, foo_id):
    resp = self._client.get(f"/foos/{foo_id}")
    resp.raise_for_status()
    data = resp.json()
    return Foo.from_dict(data)

def get_bar(self, bar_id):
    resp = self._client.get(f"/bars/{bar_id}")
    resp.raise_for_status()
    data = resp.json()
    return Bar.from_dict(data)
# ... ×21 more

Pattern template:

def get_$RESOURCE(self, ${RESOURCE}_id):
    resp = self._client.get(f"/${RESOURCE}s/{${RESOURCE}_id}")
    resp.raise_for_status()
    return $MODEL.from_dict(resp.json())

Parameters: $RESOURCE (string, e.g. foo), $MODEL (class, e.g. Foo).

Verdict: This is a clone, not an idiom. 23 copies of the same 4 lines with 2 parameters → extract:

def _get_resource(self, path: str, model: type[T]) -> T:
    resp = self._client.get(path)
    resp.raise_for_status()
    return model.from_dict(resp.json())

def get_foo(self, foo_id): return self._get_resource(f"/foos/{foo_id}", Foo)

23 × 4 lines → 23 × 1 line + 4 lines shared. And now error handling changes in one place.

Distinguishing idiom from clone

| Signal | Points to idiom | Points to clone | | --------------------------------------- | -------------------- | ------------------------- | | Pattern length | Short (2–3 lines) | Long (5+ lines) | | Parameter count | 0–1 | 2+ | | Repetition within one file | Rare | Common | | Language/framework requires this shape | Yes — it's idiom | No — it's duplication | | Would extraction make callsites clearer?| No — idiom reads fine | Yes — callsite becomes a name |

with self._lock: (1 line, 0 params, language-required shape) → idiom. The get_$RESOURCE block above (4 lines, 2 params, nothing requires this shape) → clone.

Do not

Do not extract every 2-line pattern. if x is None: return None appearing 89 times isn't duplication — it's an idiom, and extracting it to propagate_none(x) makes code less readable.
Do not report clone pairs without the extraction proposal. "These two functions are similar" is not actionable. "Extract this helper" is.
Do not ignore the parameter count. A pattern with 6 parameters that differ each time isn't extractable — the "common" part is tiny.
Do not miss semantic clones that differ textually. if not user vs if user is None — different text, same pattern. Normalize aggressively.

Output format

## Idioms (follow these)
| Pattern | Count | Example location |
| ------- | ----- | ---------------- |

## Clones (extract these)
### <pattern name>
Occurrences: <N>
Template:
<normalized pattern with $PARAMS>
Parameters: <list — what varies>
Proposed extraction:
<function signature + body>
Affected files: <list>

## Anti-patterns (fix these)
| Pattern | Count | Why bad | Locations |
| ------- | ----- | ------- | --------- |

santosomar/code-pattern-extractor

skills/code-analysis/code-pattern-extractor/SKILL.md

Identifies recurring structural patterns in a codebase — idioms, copy-paste clones, homegrown abstractions — and characterizes each as a reusable template. Use when learning a codebase's conventions, when hunting for copy-paste that should be a function, or when documenting how this team does things.

development

Updated Apr 13, 2026

$ install --global

skillsauth

npx skillsauth add santosomar/general-secure-coding-agent-skills code-pattern-extractor

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 13, 2026, 3:54 AM33.4s1 file scanned

SKILL.md

name:: code-pattern-extractor
description:: Identifies recurring structural patterns in a codebase — idioms, copy-paste clones, homegrown abstractions — and characterizes each as a reusable template. Use when learning a codebase's conventions, when hunting for copy-paste that should be a function, or when documenting how this team does things.
license:: Apache-2.0
category:: code-analysis
suite:: general-secure-coding-agent-skills
version:: 0.3.0
related:: code-search-assistant, code-smell-detector, code-refactoring-assistant

Code Pattern Extractor

Every codebase has patterns — some intentional (house idioms), some accidental (copy-paste). Finding them tells you how code gets written here, and which duplication should be consolidated.

Three pattern types

The same structural pattern can be any of the three — it depends on whether the repetition is good, accidental, or bad.

Finding clones — structural similarity

Exact-text clones are easy (rg + sort + uniq -c). Near-clones — same structure, different variable names — need normalization:

Tokenize each function.
Normalize: replace identifiers with placeholders ($1, $2, ...), normalize literals (42 → NUM, "foo" → STR).
Hash the normalized token stream.
Group by hash. Collisions are clone candidates.

Original A:   user = db.get(user_id);  if user is None: raise NotFound("user")
Original B:   order = db.get(order_id); if order is None: raise NotFound("order")

Normalized:   $1 = db.get($2);         if $1 is None: raise NotFound(STR)
              $1 = db.get($2);         if $1 is None: raise NotFound(STR)

→ Same hash. Clone pair found.

Extractable as: def get_or_404(model, id, name): ...

Finding idioms — frequency of small patterns

Idioms are short (2–5 lines) and appear everywhere. Mine them by n-gram frequency on normalized AST nodes:

The count tells you: high-count idioms are conventions to follow; high-count anti-patterns are systemic problems.

Worked example — extracting a house idiom

Observed in 23 places:

def get_foo(self, foo_id):
    resp = self._client.get(f"/foos/{foo_id}")
    resp.raise_for_status()
    data = resp.json()
    return Foo.from_dict(data)

def get_bar(self, bar_id):
    resp = self._client.get(f"/bars/{bar_id}")
    resp.raise_for_status()
    data = resp.json()
    return Bar.from_dict(data)
# ... ×21 more

Pattern template:

def get_$RESOURCE(self, ${RESOURCE}_id):
    resp = self._client.get(f"/${RESOURCE}s/{${RESOURCE}_id}")
    resp.raise_for_status()
    return $MODEL.from_dict(resp.json())

Parameters: $RESOURCE (string, e.g. foo), $MODEL (class, e.g. Foo).

Verdict: This is a clone, not an idiom. 23 copies of the same 4 lines with 2 parameters → extract:

def _get_resource(self, path: str, model: type[T]) -> T:
    resp = self._client.get(path)
    resp.raise_for_status()
    return model.from_dict(resp.json())

def get_foo(self, foo_id): return self._get_resource(f"/foos/{foo_id}", Foo)

23 × 4 lines → 23 × 1 line + 4 lines shared. And now error handling changes in one place.

Distinguishing idiom from clone

with self._lock: (1 line, 0 params, language-required shape) → idiom. The get_$RESOURCE block above (4 lines, 2 params, nothing requires this shape) → clone.

Do not

Do not extract every 2-line pattern. if x is None: return None appearing 89 times isn't duplication — it's an idiom, and extracting it to propagate_none(x) makes code less readable.
Do not report clone pairs without the extraction proposal. "These two functions are similar" is not actionable. "Extract this helper" is.
Do not ignore the parameter count. A pattern with 6 parameters that differ each time isn't extractable — the "common" part is tiny.
Do not miss semantic clones that differ textually. if not user vs if user is None — different text, same pattern. Normalize aggressively.

Output format

## Idioms (follow these)
| Pattern | Count | Example location |
| ------- | ----- | ---------------- |

## Clones (extract these)
### <pattern name>
Occurrences: <N>
Template:
<normalized pattern with $PARAMS>
Parameters: <list — what varies>
Proposed extraction:
<function signature + body>
Affected files: <list>

## Anti-patterns (fix these)
| Pattern | Count | Why bad | Locations |
| ------- | ----- | ------- | --------- |

Related Skills

santosomar/verified-pseudocode-extractor

development

VerifiedTrustedCommunity

Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.

SKILL.mdUpdated Apr 13, 2026

santosomar/verified-pseudocode-extractor

santosomar/tlaplus-spec-generator

development

VerifiedTrustedCommunity

Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-spec-generator

santosomar/tlaplus-model-reduction

testing

VerifiedTrustedCommunity

Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-model-reduction

santosomar/tlaplus-guided-code-repair

development

VerifiedTrustedCommunity

TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.

SKILL.mdUpdated Apr 13, 2026

santosomar/tlaplus-guided-code-repair

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/santosomar/general-secure-coding-agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r general-secure-coding-agent-skills/skills/code-analysis/code-pattern-extractor ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

santosomar/general-secure-coding-agent-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT