Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

oaustegard/searching-codebases

Name: searching-codebases
Author: oaustegard

searching-codebases/SKILL.md

npx skillsauth add oaustegard/claude-skills searching-codebases

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Searching Codebases

Find code in any codebase by pattern or concept. One entry point, two search strategies, automatic routing.

Prerequisites

uv tool install ripgrep

tree-sitting installs automatically when needed — for --expand context expansion and for the binding-resolved --refs/--def/--hover tier, which uses it to resolve symbol positions. Only the bare tree-sitter package is fetched; the language grammars ship bundled.

Primary Command

SKILL_DIR=/mnt/skills/user/searching-codebases

python3 $SKILL_DIR/scripts/search.py SOURCE "query1" ["query2" ...] [OPTIONS]

SOURCE is any of:

Local directory path
GitHub URL (downloads tarball automatically)
uploads (uses /mnt/user-data/uploads/)
project (uses /mnt/project/)
Path to a .zip or .tar.gz archive

Search Modes

Regex mode (patterns, identifiers, literal text):

python3 $SKILL_DIR/scripts/search.py ./repo "def handle_error"
python3 $SKILL_DIR/scripts/search.py ./repo "class.*Exception" --regex
python3 $SKILL_DIR/scripts/search.py ./repo "TODO|FIXME|HACK"

Semantic mode (concepts, natural language):

python3 $SKILL_DIR/scripts/search.py ./repo "retry logic with backoff" --semantic
python3 $SKILL_DIR/scripts/search.py ./repo "authentication flow"
python3 $SKILL_DIR/scripts/search.py ./repo "error handling strategy"

Auto-detection: short queries and code-like tokens → regex. Multi-word natural language → semantic. Override with --regex or --semantic.

Binding-resolved mode (Python only — pyright via the python-lsp skill):

python3 $SKILL_DIR/scripts/search.py ./repo --refs SYMBOL    # find all real uses
python3 $SKILL_DIR/scripts/search.py ./repo --def SYMBOL     # go-to-definition
python3 $SKILL_DIR/scripts/search.py ./repo --hover SYMBOL   # inferred type/signature

Regex mode matches text, so a cross-reference for a function false-positives on shadowed and same-named-but-unrelated symbols. --refs is binding-resolved: pyright excludes the unrelated same-named symbol and follows imports. Use it when you need a true "find all callers/users" for a .py symbol, not a text grep.

The tier is engaged lazily — pyright's index cost is paid only when you ask for --refs/--def/--hover, never on ordinary searches. It is Python-only; for non-.py sources, or when pyright/node is unavailable, it prints a one-line degradation note and falls back to the regex text path. Each takes a single bare symbol name and is mutually exclusive with the other two and with text queries.

Options

--regex / --semantic: Force search mode
--refs SYMBOL / --def SYMBOL / --hover SYMBOL: Binding-resolved Python queries via pyright (see Binding-resolved mode above)
--expand: Return full function bodies via tree-sitting AST context
--benchmark: Compare indexed regex vs brute-force ripgrep
--branch NAME: Git branch for GitHub URLs (default: main)
--skip DIRS: Comma-separated directories to skip
--json: Machine-readable output
-v: Show index stats and query routing decisions

How It Works

Regex search builds a sparse n-gram inverted index over all files. Queries are decomposed into literal fragments, looked up in the index to identify candidate files (typically 90-99% reduction), then verified with ripgrep. Frequency-weighted n-grams make rare character sequences more selective.

Semantic search builds a TF-IDF index over code chunks (functions, classes, structural entries). Queries are ranked by cosine similarity.

Context expansion (--expand) uses tree-sitting's AST cache to identify function/class boundaries, returning complete structural units rather than line fragments. On first use, tree-sitting scans the repo (~700ms for 250 files); subsequent expansions are sub-millisecond.

Small codebases (< 20 files) skip indexing entirely — direct ripgrep is faster when there's nothing to narrow.

Mixed Queries

Multiple queries can use different modes in a single invocation. Each query is auto-routed independently, and indexes are built once per mode:

python3 $SKILL_DIR/scripts/search.py ./repo \
  "class.*Error" \
  "error recovery strategy" \
  "def retry"

Dependencies

tree-sitting: Provides AST context expansion for --expand and the symbol→position resolution that seeds the binding-resolved tier (--refs/--def/--hover). Auto-installs the bare tree-sitter package when either is used (grammars are bundled). Regex and semantic search work without it.
ripgrep: Required for regex verification. Install via uv tool install ripgrep.
scikit-learn: Required for semantic mode. Installs automatically.
python-lsp: Provides the binding-resolved tier (--refs/--def/--hover). Self-bootstraps pyright on first use and requires system node (v18+). Not required — without it those flags degrade to the regex text path.

When to Use — narrow, by design

The ONE recommended use: binding-resolved Python symbol queries.

"find all callers of X" / "where is X really defined" for a .py symbol, when same-named-but-unrelated symbols would pollute a text grep. Empirical basis: rg get on psf/requests returned 232 hits, 224 of them false; --refs get excluded all 224 (2026-06-15).

When NOT to Use — which is most of the time

Everything else. Measured head-to-head on real issue-localization tasks (7 scikit-learn issues with merged fix-PRs, gold = PR diff files, 2026-07-04, replicating the file-discovery metric of arXiv:2602.11988):

Literal tokens / identifiers: naive rg -l tied or beat the indexed tier on recall@10 in every instance, at 0.4s vs 25s.
Concept / natural-language search: the TF-IDF semantic tier never beat identifier grep — not even on identifier-poor issues, which are themselves rare (~0.3% of merged-PR traffic in the sample).
First encounter / "what is this repo": use exploring-codebases.
Repos under ~20 files: read them.

The self-test before invoking: would plain rg return the same answer? If yes, use rg. The indexed-regex and semantic tiers are retained for completeness and for corpora where they may yet earn their cost (very large repos, non-code document collections), but they carry the burden of proof.

Files

scripts/search.py — Entry point, query routing, output formatting
scripts/resolve.py — Input source resolution (GitHub, uploads, archives)
scripts/context.py — tree-sitting-based AST context expansion
scripts/ngram_index.py — Sparse n-gram inverted index, regex decomposition
scripts/sparse_ngrams.py — Core n-gram algorithms, frequency weights
scripts/code_rag.py — TF-IDF semantic search over code chunks
scripts/lsp_refs.py — Binding-resolved Python tier: symbol→position resolution (tree-sitting), pyright queries (python-lsp), soft fallback

oaustegard/searching-codebases

searching-codebases/SKILL.md

Binding-resolved Python symbol queries — find all true callers (--refs), go-to-definition (--def), or inferred signature (--hover) of a .py symbol via pyright, excluding same-named false positives that text grep cannot. Use when a task needs ALL callers/users of a Python symbol or its real definition and text matching would over-match. For everything else — literal tokens, regex patterns, concept/natural-language search, any repo size — plain ripgrep is faster and equally accurate: measured 2026-07-04 on real issue-localization tasks, the semantic and indexed-regex tiers tied or lost against naive rg at 4-60x the wall-clock cost. Those tiers remain available below but are NOT recommended as a default.

127 stars

development

Updated Jul 5, 2026

$ install --global

skillsauth

npx skillsauth add oaustegard/claude-skills searching-codebases

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 5, 2026, 5:46 AM132.2s14 files scanned

SKILL.md

name:: searching-codebases
description:: >-
size — plain ripgrep is faster and equally accurate:: measured 2026-07-04
version:: 2.2.0

Searching Codebases

Find code in any codebase by pattern or concept. One entry point, two search strategies, automatic routing.

Prerequisites

uv tool install ripgrep

Primary Command

SKILL_DIR=/mnt/skills/user/searching-codebases

python3 $SKILL_DIR/scripts/search.py SOURCE "query1" ["query2" ...] [OPTIONS]

SOURCE is any of:

Local directory path
GitHub URL (downloads tarball automatically)
uploads (uses /mnt/user-data/uploads/)
project (uses /mnt/project/)
Path to a .zip or .tar.gz archive

Search Modes

Regex mode (patterns, identifiers, literal text):

python3 $SKILL_DIR/scripts/search.py ./repo "def handle_error"
python3 $SKILL_DIR/scripts/search.py ./repo "class.*Exception" --regex
python3 $SKILL_DIR/scripts/search.py ./repo "TODO|FIXME|HACK"

Semantic mode (concepts, natural language):

python3 $SKILL_DIR/scripts/search.py ./repo "retry logic with backoff" --semantic
python3 $SKILL_DIR/scripts/search.py ./repo "authentication flow"
python3 $SKILL_DIR/scripts/search.py ./repo "error handling strategy"

Auto-detection: short queries and code-like tokens → regex. Multi-word natural language → semantic. Override with --regex or --semantic.

Binding-resolved mode (Python only — pyright via the python-lsp skill):

python3 $SKILL_DIR/scripts/search.py ./repo --refs SYMBOL    # find all real uses
python3 $SKILL_DIR/scripts/search.py ./repo --def SYMBOL     # go-to-definition
python3 $SKILL_DIR/scripts/search.py ./repo --hover SYMBOL   # inferred type/signature

Options

--regex / --semantic: Force search mode
--refs SYMBOL / --def SYMBOL / --hover SYMBOL: Binding-resolved Python queries via pyright (see Binding-resolved mode above)
--expand: Return full function bodies via tree-sitting AST context
--benchmark: Compare indexed regex vs brute-force ripgrep
--branch NAME: Git branch for GitHub URLs (default: main)
--skip DIRS: Comma-separated directories to skip
--json: Machine-readable output
-v: Show index stats and query routing decisions

How It Works

Semantic search builds a TF-IDF index over code chunks (functions, classes, structural entries). Queries are ranked by cosine similarity.

Small codebases (< 20 files) skip indexing entirely — direct ripgrep is faster when there's nothing to narrow.

Mixed Queries

Multiple queries can use different modes in a single invocation. Each query is auto-routed independently, and indexes are built once per mode:

python3 $SKILL_DIR/scripts/search.py ./repo \
  "class.*Error" \
  "error recovery strategy" \
  "def retry"

Dependencies

tree-sitting: Provides AST context expansion for --expand and the symbol→position resolution that seeds the binding-resolved tier (--refs/--def/--hover). Auto-installs the bare tree-sitter package when either is used (grammars are bundled). Regex and semantic search work without it.
ripgrep: Required for regex verification. Install via uv tool install ripgrep.
scikit-learn: Required for semantic mode. Installs automatically.
python-lsp: Provides the binding-resolved tier (--refs/--def/--hover). Self-bootstraps pyright on first use and requires system node (v18+). Not required — without it those flags degrade to the regex text path.

When to Use — narrow, by design

The ONE recommended use: binding-resolved Python symbol queries.

"find all callers of X" / "where is X really defined" for a .py symbol, when same-named-but-unrelated symbols would pollute a text grep. Empirical basis: rg get on psf/requests returned 232 hits, 224 of them false; --refs get excluded all 224 (2026-06-15).

When NOT to Use — which is most of the time

Literal tokens / identifiers: naive rg -l tied or beat the indexed tier on recall@10 in every instance, at 0.4s vs 25s.
Concept / natural-language search: the TF-IDF semantic tier never beat identifier grep — not even on identifier-poor issues, which are themselves rare (~0.3% of merged-PR traffic in the sample).
First encounter / "what is this repo": use exploring-codebases.
Repos under ~20 files: read them.

Files

scripts/search.py — Entry point, query routing, output formatting
scripts/resolve.py — Input source resolution (GitHub, uploads, archives)
scripts/context.py — tree-sitting-based AST context expansion
scripts/ngram_index.py — Sparse n-gram inverted index, regex decomposition
scripts/sparse_ngrams.py — Core n-gram algorithms, frequency weights
scripts/code_rag.py — TF-IDF semantic search over code chunks
scripts/lsp_refs.py — Binding-resolved Python tier: symbol→position resolution (tree-sitting), pyright queries (python-lsp), soft fallback

Related Skills

oaustegard/writing-instructions

development

VerifiedTrustedCommunity

Write effective instructions for Claude: project instructions, standalone prompts, and skill content. Use when users need help writing prompts, setting up project instructions, choosing between instruction formats, or improving how they communicate with Claude. Covers writing principles, model-aware calibration, and format selection. For building and testing complete skills, use skill-creator instead.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/writing-instructions

oaustegard/finding-skills

data-ai

VerifiedTrustedCommunity

Discover and load skills on demand from /mnt/skills/user/. Use when you need a capability but don't know which skill provides it, when the boot-emitted skill list is names-only and you need a full description, or when you want to list the catalog. Verbs are list (names only), search (rank by name/description match against a query), and show (emit the full SKILL.md for a named skill).

134SKILL.mdUpdated Jul 26, 2026

oaustegard/finding-skills

oaustegard/transcribing-images

documentation

VerifiedTrustedCommunity

Reads the visual content of slides, pages, and images the way a human would, not just their embedded text. Use when a PPTX or PDF has image slides, screenshots, charts, scanned figures, or flattened-to-image layouts that the built-in pptx/pdf skills read as empty; when asked to transcribe, describe, OCR, or extract what is shown in an image, slide deck, or document page; or when embedded-text extraction returned little or nothing from a visually rich file. Triggers on 'read this deck', 'what's on these slides', 'transcribe', 'OCR', 'extract text from image', 'describe this chart/diagram', .pptx/.pdf/.png/.jpg with visual content.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/transcribing-images

oaustegard/svg-portrait-mode

development

VerifiedTrustedCommunity

Portrait Mode for SVGs — foveated vectorization with 4-zone selective detail. Combines vision annotations, MediaPipe segmentation/landmarks, and optional saliency. Like phone portrait mode, but vectorized. Use when vectorizing a portrait or photo where subject detail should outrank background detail.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/svg-portrait-mode

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/oaustegard/claude-skills.git

# Copy into Claude Code skills folder (global)
cp -r claude-skills/searching-codebases ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

oaustegard/claude-skills

127 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT