Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

oaustegard/exploring-codebases

Name: exploring-codebases
Author: oaustegard

exploring-codebases/SKILL.md

npx skillsauth add oaustegard/claude-skills exploring-codebases

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Exploring Codebases

Exploratory code analysis for unfamiliar repositories. Orchestrates tree-sitting (structural) and featuring (semantic) over a local copy.

Workflow

Five numbered steps, in order. Do not skip step 0.

0. Setup (once per session)

uv venv /home/claude/.venv 2>/dev/null
uv pip install tree-sitter --python /home/claude/.venv/bin/python
export PYTHON=/home/claude/.venv/bin/python
export TREESIT=/mnt/skills/user/tree-sitting/scripts/treesit.py
export GATHER=/mnt/skills/user/featuring/scripts/gather.py

If step 2's --stats later reports Scanned 0 files ... Errors: 1, the tree-sitter core package isn't installed — come back here and install it (the engine bundles its own grammars and does NOT use tree-sitter-language-pack). Treesit fails silently on missing deps; it does not raise a useful error.

1. Get the repo (tarball, not per-file)

OWNER=...
REPO=...
REF=main                    # branch name, tag, or SHA. For a PR: pull/N/head
curl -sL -H "Authorization: Bearer $GH_TOKEN" \
  "https://api.github.com/repos/$OWNER/$REPO/tarball/$REF" -o /tmp/$REPO.tar.gz
mkdir -p /tmp/$REPO && tar -xzf /tmp/$REPO.tar.gz -C /tmp/$REPO --strip-components=1
ls /tmp/$REPO | head        # sanity check — did extraction land?

One HTTP call gets the whole repo. Do NOT curl README, cat files, or fetch via contents/PATH first — they're in the tarball. The Authorization header is only needed for private repos; public repos work without it.

Ref selection matters. If exploring a feature branch, PR, or tag, set REF accordingly. The default main will silently give you stale code if the question is about an unmerged branch.

2. Structural scan

$PYTHON $TREESIT /tmp/$REPO --stats

Read the output. It gives file counts, symbol counts, languages, and per-directory symbol density. This IS the orienting artifact — treat it as the product of this step, not warm-up.

Drill only if you have a specific question. For pure "what is this repo" exploration, skip drilling and go to step 3 — featuring surfaces the interesting paths for you. Drill when a user asked about a specific subsystem, or when step 3's output raises a question that needs source.

When you do drill, batch queries in one invocation. Every treesit call pays the full scan cost. Multiple queries added to the same command share that scan and each additional query adds ~0ms. If you're about to make a second treesit call on the same path, fold it into the first.

# GOOD — one scan, three answers
$PYTHON $TREESIT /tmp/$REPO --path=SUBDIR --detail=full \
  'find:*Handler*:function' 'source:main' 'refs:Config'

# BAD — three scans, three answers (3× the cost for the same information)
$PYTHON $TREESIT /tmp/$REPO --path=SUBDIR --detail=full
$PYTHON $TREESIT /tmp/$REPO 'find:*Handler*:function'
$PYTHON $TREESIT /tmp/$REPO 'refs:Config'

3. Feature synthesis

$PYTHON $GATHER /tmp/$REPO \
  --skip tests,.github,node_modules --source-budget 8000

Output includes a "Candidate areas for sub-files (by symbol density)" list near the top — that's your drill-target picker, ranked.

4. Reason about the combined output

Synthesize 2+3: capabilities, feature groups, architecture, entry points, anomalies. Produce _FEATURES.md when warranted. This is the LLM step; everything before was mechanical.

When to Use This vs Other Skills

| Situation | Use | |-----------|-----| | "I just cloned this, what is it?" | exploring-codebases (this skill) | | "Where is the retry logic?" | searching-codebases | | "Find all files matching class.*Error" | searching-codebases | | "Show me the symbols in auth.py" | tree-sitting directly | | "Which files are most about CSRF / sessions / queryset filtering?" | bm25 | | "Rank these docs by relevance to a multi-word concept" | bm25 | | "Document what this codebase does" | featuring directly |

Exploring is the divergent skill — you don't know what you're looking for yet. Searching is the convergent skill — you know what you want.

Pairing bm25 with this workflow

Once steps 2–3 have surfaced the rough shape of the repo, bm25 is the natural complement when you want ranked content search beyond grep and beyond exact-symbol lookup. It ranks files by lexical relevance to a multi-word query, which is useful for "what's this codebase actually about when I search for X?" — particularly when you don't yet know the symbol name to feed to tree-sitting.

BM25=/mnt/skills/user/bm25/scripts/bm25.py

# Pass multiple queries — index builds once, all queries reuse it
python3 $BM25 /tmp/$REPO 'auth flow' 'session backend' 'middleware pipeline' \
  --exclude 'tests/*' --exclude '*/tests/*' --top-k 5

Two patterns that pair especially well:

bm25 → tree-sitting. Use bm25 to find the top-ranked files for a concept; then tree-sitting source:Symbol:path/to/file.py to read the actual implementation.
bm25 with --exclude 'tests/*'. Test directories tend to dominate keyword queries because test names redundantly mention domain terms. Excluding them up front lands you on implementation files.

bm25 is corpus-agnostic — it'll also work on project knowledge stores or uploads/ if your exploration spans docs, transcripts, or PDFs.

Delegating to subagents (large repos only, and only if subagents exist)

Gate first: does this environment expose a subagent tool (Agent/Task in Claude Code and CCotw)? Claude.ai chat and bare-skill runs have none — run steps 2–4 inline and skip this section entirely. Never simulate fan-out by other means when the tool is absent.

When the tool exists and the repo is large (>1000 files or several distinct subsystems), keep steps 2–3 inline — they're mechanical and cheap — and fan out only step 4's judgment work, one agent per subsystem.

Subagents inherit nothing. Not your conversation, not this SKILL.md, not the knowledge that scan artifacts exist on disk. An agent prompted only with "explore crates/foo" will re-derive structure by ls/glob crawling at full tier cost. (Observed 2026-07-16: four Sonnet agents launched onto a 2,300-file repo without the handoff spent their opening turns running ls, with the full symbol index already on disk.)

Every subagent prompt must therefore carry:

Its structure slice, pre-computed. Partition the gather output's ## Public API section by subsystem path prefix, write each slice to a file, and point the agent at its file: "grep/Read this instead of listing directories." Small slices (<50KB) can be pasted inline instead.
The treesit recipe verbatim — the full command including the venv python path, plus the batch rule (one invocation, many queries; each invocation pays the scan, extra queries are free).
Anti-crawl instructions — no ls/Glob for discovery; Read only to confirm or expand a line range the slice or treesit already located.
An output spec — what to report, a line budget, and "file paths + line refs" so results are verifiable.

Routing (see the agent-routing skill): multi-turn exploration is outside Haiku's calibrated zone — use sonnet for subsystem agents and keep the final cross-cluster synthesis in the orchestrator.

Notes

Large repos (>100 files): use --skip tests,vendored,docs,... in step 2 to focus the scan.
Monorepos: treat each package/service as a separate exploration. Generate per-subsystem _FEATURES.md files linked from a root index.
Drill heuristics (if step 2 drilling is warranted): directories with high symbol-to-file ratio (dense logic), entry-point names (main, cli, app, server, routes), files with many imports (integration points).

oaustegard/exploring-codebases

exploring-codebases/SKILL.md

First-encounter codebase orientation. Chains tree-sitting (structural inventory) and featuring (feature synthesis) into an EDA workflow for unfamiliar repositories. Use when someone says "explore this repo", "what does this do", "I just cloned this", "help me understand this codebase", or when starting work on an unfamiliar repository. This is the divergent "what's here?" skill — for targeted "where is X?" queries, use searching-codebases instead.

132 stars

development

Updated Jul 19, 2026

$ install --global

skillsauth

npx skillsauth add oaustegard/claude-skills exploring-codebases

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jul 19, 2026, 6:48 AM182.6s3 files scanned

SKILL.md

name:: exploring-codebases
description:: >-
version:: 2.4.0

Exploring Codebases

Exploratory code analysis for unfamiliar repositories. Orchestrates tree-sitting (structural) and featuring (semantic) over a local copy.

Workflow

Five numbered steps, in order. Do not skip step 0.

0. Setup (once per session)

uv venv /home/claude/.venv 2>/dev/null
uv pip install tree-sitter --python /home/claude/.venv/bin/python
export PYTHON=/home/claude/.venv/bin/python
export TREESIT=/mnt/skills/user/tree-sitting/scripts/treesit.py
export GATHER=/mnt/skills/user/featuring/scripts/gather.py

1. Get the repo (tarball, not per-file)

OWNER=...
REPO=...
REF=main                    # branch name, tag, or SHA. For a PR: pull/N/head
curl -sL -H "Authorization: Bearer $GH_TOKEN" \
  "https://api.github.com/repos/$OWNER/$REPO/tarball/$REF" -o /tmp/$REPO.tar.gz
mkdir -p /tmp/$REPO && tar -xzf /tmp/$REPO.tar.gz -C /tmp/$REPO --strip-components=1
ls /tmp/$REPO | head        # sanity check — did extraction land?

Ref selection matters. If exploring a feature branch, PR, or tag, set REF accordingly. The default main will silently give you stale code if the question is about an unmerged branch.

2. Structural scan

$PYTHON $TREESIT /tmp/$REPO --stats

Read the output. It gives file counts, symbol counts, languages, and per-directory symbol density. This IS the orienting artifact — treat it as the product of this step, not warm-up.

# GOOD — one scan, three answers
$PYTHON $TREESIT /tmp/$REPO --path=SUBDIR --detail=full \
  'find:*Handler*:function' 'source:main' 'refs:Config'

# BAD — three scans, three answers (3× the cost for the same information)
$PYTHON $TREESIT /tmp/$REPO --path=SUBDIR --detail=full
$PYTHON $TREESIT /tmp/$REPO 'find:*Handler*:function'
$PYTHON $TREESIT /tmp/$REPO 'refs:Config'

3. Feature synthesis

$PYTHON $GATHER /tmp/$REPO \
  --skip tests,.github,node_modules --source-budget 8000

Output includes a "Candidate areas for sub-files (by symbol density)" list near the top — that's your drill-target picker, ranked.

4. Reason about the combined output

Synthesize 2+3: capabilities, feature groups, architecture, entry points, anomalies. Produce _FEATURES.md when warranted. This is the LLM step; everything before was mechanical.

When to Use This vs Other Skills

Exploring is the divergent skill — you don't know what you're looking for yet. Searching is the convergent skill — you know what you want.

Pairing bm25 with this workflow

BM25=/mnt/skills/user/bm25/scripts/bm25.py

# Pass multiple queries — index builds once, all queries reuse it
python3 $BM25 /tmp/$REPO 'auth flow' 'session backend' 'middleware pipeline' \
  --exclude 'tests/*' --exclude '*/tests/*' --top-k 5

Two patterns that pair especially well:

bm25 → tree-sitting. Use bm25 to find the top-ranked files for a concept; then tree-sitting source:Symbol:path/to/file.py to read the actual implementation.
bm25 with --exclude 'tests/*'. Test directories tend to dominate keyword queries because test names redundantly mention domain terms. Excluding them up front lands you on implementation files.

bm25 is corpus-agnostic — it'll also work on project knowledge stores or uploads/ if your exploration spans docs, transcripts, or PDFs.

Delegating to subagents (large repos only, and only if subagents exist)

Every subagent prompt must therefore carry:

Its structure slice, pre-computed. Partition the gather output's ## Public API section by subsystem path prefix, write each slice to a file, and point the agent at its file: "grep/Read this instead of listing directories." Small slices (<50KB) can be pasted inline instead.
The treesit recipe verbatim — the full command including the venv python path, plus the batch rule (one invocation, many queries; each invocation pays the scan, extra queries are free).
Anti-crawl instructions — no ls/Glob for discovery; Read only to confirm or expand a line range the slice or treesit already located.
An output spec — what to report, a line budget, and "file paths + line refs" so results are verifiable.

Notes

Large repos (>100 files): use --skip tests,vendored,docs,... in step 2 to focus the scan.
Monorepos: treat each package/service as a separate exploration. Generate per-subsystem _FEATURES.md files linked from a root index.
Drill heuristics (if step 2 drilling is warranted): directories with high symbol-to-file ratio (dense logic), entry-point names (main, cli, app, server, routes), files with many imports (integration points).

Related Skills

oaustegard/writing-instructions

development

VerifiedTrustedCommunity

Write effective instructions for Claude: project instructions, standalone prompts, and skill content. Use when users need help writing prompts, setting up project instructions, choosing between instruction formats, or improving how they communicate with Claude. Covers writing principles, model-aware calibration, and format selection. For building and testing complete skills, use skill-creator instead.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/writing-instructions

oaustegard/finding-skills

data-ai

VerifiedTrustedCommunity

Discover and load skills on demand from /mnt/skills/user/. Use when you need a capability but don't know which skill provides it, when the boot-emitted skill list is names-only and you need a full description, or when you want to list the catalog. Verbs are list (names only), search (rank by name/description match against a query), and show (emit the full SKILL.md for a named skill).

134SKILL.mdUpdated Jul 26, 2026

oaustegard/finding-skills

oaustegard/transcribing-images

documentation

VerifiedTrustedCommunity

Reads the visual content of slides, pages, and images the way a human would, not just their embedded text. Use when a PPTX or PDF has image slides, screenshots, charts, scanned figures, or flattened-to-image layouts that the built-in pptx/pdf skills read as empty; when asked to transcribe, describe, OCR, or extract what is shown in an image, slide deck, or document page; or when embedded-text extraction returned little or nothing from a visually rich file. Triggers on 'read this deck', 'what's on these slides', 'transcribe', 'OCR', 'extract text from image', 'describe this chart/diagram', .pptx/.pdf/.png/.jpg with visual content.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/transcribing-images

oaustegard/svg-portrait-mode

development

VerifiedTrustedCommunity

Portrait Mode for SVGs — foveated vectorization with 4-zone selective detail. Combines vision annotations, MediaPipe segmentation/landmarks, and optional saliency. Like phone portrait mode, but vectorized. Use when vectorizing a portrait or photo where subject detail should outrank background detail.

134SKILL.mdUpdated Jul 26, 2026

oaustegard/svg-portrait-mode

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/oaustegard/claude-skills.git

# Copy into Claude Code skills folder (global)
cp -r claude-skills/exploring-codebases ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

oaustegard/claude-skills

132 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT