searching-codebases/SKILL.md
Find code by regex pattern or natural language concept in any codebase. Auto-routes between n-gram indexed regex search (2-20x faster than ripgrep) and TF-IDF semantic search. Expands results to full functions via tree-sitting AST data. Accepts GitHub URLs, local directories, uploaded files/archives, or project knowledge. Use when asked to find implementations, search for patterns, or answer "where is X" / "how does Y work" about code. Triggers on "search this repo", "find where X is", "grep for", "what handles Y", regex patterns, or natural-language questions about code. This is the convergent "find X" skill — for first-encounter orientation, use exploring-codebases instead.
npx skillsauth add oaustegard/claude-skills searching-codebasesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Find code in any codebase by pattern or concept. One entry point, two search strategies, automatic routing.
uv tool install ripgrep
tree-sitting (for structural context expansion) installs automatically when
the --expand flag is used.
SKILL_DIR=/mnt/skills/user/searching-codebases
python3 $SKILL_DIR/scripts/search.py SOURCE "query1" ["query2" ...] [OPTIONS]
SOURCE is any of:
uploads (uses /mnt/user-data/uploads/)project (uses /mnt/project/).zip or .tar.gz archiveRegex mode (patterns, identifiers, literal text):
python3 $SKILL_DIR/scripts/search.py ./repo "def handle_error"
python3 $SKILL_DIR/scripts/search.py ./repo "class.*Exception" --regex
python3 $SKILL_DIR/scripts/search.py ./repo "TODO|FIXME|HACK"
Semantic mode (concepts, natural language):
python3 $SKILL_DIR/scripts/search.py ./repo "retry logic with backoff" --semantic
python3 $SKILL_DIR/scripts/search.py ./repo "authentication flow"
python3 $SKILL_DIR/scripts/search.py ./repo "error handling strategy"
Auto-detection: short queries and code-like tokens → regex. Multi-word
natural language → semantic. Override with --regex or --semantic.
--regex / --semantic: Force search mode--expand: Return full function bodies via tree-sitting AST context--benchmark: Compare indexed regex vs brute-force ripgrep--branch NAME: Git branch for GitHub URLs (default: main)--skip DIRS: Comma-separated directories to skip--json: Machine-readable output-v: Show index stats and query routing decisionsRegex search builds a sparse n-gram inverted index over all files. Queries are decomposed into literal fragments, looked up in the index to identify candidate files (typically 90-99% reduction), then verified with ripgrep. Frequency-weighted n-grams make rare character sequences more selective.
Semantic search builds a TF-IDF index over code chunks (functions, classes, structural entries). Queries are ranked by cosine similarity.
Context expansion (--expand) uses tree-sitting's AST cache to
identify function/class boundaries, returning complete structural units
rather than line fragments. On first use, tree-sitting scans the repo
(~700ms for 250 files); subsequent expansions are sub-millisecond.
Small codebases (< 20 files) skip indexing entirely — direct ripgrep is faster when there's nothing to narrow.
Multiple queries can use different modes in a single invocation. Each query is auto-routed independently, and indexes are built once per mode:
python3 $SKILL_DIR/scripts/search.py ./repo \
"class.*Error" \
"error recovery strategy" \
"def retry"
--expand.
Not required — search works without it, just with less structural context
in results.uv tool install ripgrep.find_symbol('ClassName') via tree-sitting is simplertree_overview() / dir_overview()scripts/search.py — Entry point, query routing, output formattingscripts/resolve.py — Input source resolution (GitHub, uploads, archives)scripts/context.py — tree-sitting-based AST context expansionscripts/ngram_index.py — Sparse n-gram inverted index, regex decompositionscripts/sparse_ngrams.py — Core n-gram algorithms, frequency weightsscripts/code_rag.py — TF-IDF semantic search over code chunksdevelopment
--- name: verifying-claims description: Check that a document's claims about code are actually true by reading the prose, the code, and the tests and reporting (or fixing) where they disagree. Use whenever the user wants to verify a README, guide, spec, or docstring still matches the code; whenever they mention documentation drift, doc-code sync, "is this still accurate", stale docs, or keeping docs/tests/code consistent; before publishing or merging a docs change; or as a periodic doc-accuracy
tools
Query, filter, and transform Markdown structurally with mq — a jq-like CLI for Markdown. Use to extract headings/sections/code-blocks/links from .md files, build a table of contents, pull code blocks of a given language, slice or reshape LLM prompt/output Markdown, or batch-transform docs. Triggers on "extract sections from this markdown", "get all the code blocks", "jq for markdown", "mq", or any structural query over Markdown that grep/Read can't do cleanly.
development
Composes single-file HTML artifacts (PR review writeups, status reports, incident postmortems, slide decks, design systems, prototypes, flowcharts, module maps, feature explainers, kanban boards, prompt tuners) from a small JSON spec instead of hand-written HTML/CSS/JS. Use when the user asks to "compare options side-by-side", requests an HTML version of a report or review or deck, asks for a flowchart, status update, postmortem, design system reference, interactive prototype, custom editor — or explicitly says "HTML artifact", "single HTML file", "self-contained HTML". Skip for ad-hoc HTML snippets (forms, emails, embedded widgets) where there's no template fit.
development
DAG workflow runner that encodes control flow in code, not prose. Use when a procedure has 3+ steps with branching, retries, or validation that must be enforced — gates as `when=`, edge contracts as `validate=`, predicate loops as `retry_until=`. The runner owns the graph; the LLM provides leaves. Also covers parallel execution, checkpoint resume, detached side-effects.