building-github-index/SKILL.md
Generate progressive disclosure indexes for GitHub repositories to use as Claude project knowledge. Use when setting up projects referencing external documentation, creating searchable indexes of technical blogs or knowledge bases, combining multiple repos into one index, or when user mentions "index", "github repo", "project knowledge", or "documentation reference".
npx skillsauth add oaustegard/claude-skills building-github-indexInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Create markdown indexes of GitHub repositories optimized for Claude project knowledge. Indexes enable retrieval via GitHub API with semantic descriptions for effective matching.
# Documentation repos (markdown/notebooks)
python scripts/github_index.py owner/repo -o index.md
# Code repos (extract symbols via tree-sitter)
python scripts/github_index.py owner/repo --code-symbols -o index.md
# Multiple repos combined
python scripts/github_index.py owner/repo1 owner/repo2 -o combined.md
| Flag | Description |
|------|-------------|
| -o, --output | Output file (default: github_index.md) |
| --token | GitHub PAT; also reads GITHUB_TOKEN env |
| --include-patterns | Only index matching globs: "docs/**" "src/**" |
| --exclude-patterns | Skip matching globs: "test/**" |
| --max-files | Cap files per repo (default: 200) |
| --skip-fetch | Tree only, no content fetch (fast, filename-only descriptions) |
| --code-symbols | Include code files, extract function/class names via tree-sitter |
title: and description: fields--code-symbols)Some repos have stub files (links to external docs, empty readmes). In these cases:
Manual curation recommended. Use the tree output and domain knowledge:
# Get tree structure only (fast)
python scripts/github_index.py owner/repo --skip-fetch -o skeleton.md
# Then manually enhance descriptions based on domain knowledge
For code-heavy repos with embedded apps:
acc_wav_gen → "ACC waveform generation"# {Repo} - Content Index
**Repository:** {url}
**Branch:** `{branch}`
## Retrieval Method
{API curl commands}
---
## {Category}
| Description | Path |
|-------------|------|
| {What this covers} | `{path/file.md}` |
Description column leads (relevance matching), path follows (retrieval key).
Enumerate files:
curl -sL "https://api.github.com/repos/OWNER/REPO/git/trees/BRANCH?recursive=1"
Fetch content:
curl -s "https://api.github.com/repos/OWNER/REPO/contents/PATH?ref=BRANCH" \
-H "Accept: application/vnd.github+json" | \
python3 -c "import sys,json,base64; print(base64.b64decode(json.load(sys.stdin)['content']).decode())"
Both scripts download a repo tarball (single HTTP request, no per-file rate limits) then process files locally. Allowlist: api.github.com (tarball redirects via this endpoint)
accessing-github-repos - Private repos, PAT setup, tarball downloadmapping-codebases - Detailed code structure (methods, imports, line numbers)For token-constrained project knowledge, use the condensed script:
python scripts/pk_index.py owner/repo -o repo_pk.md
Produces ~80% smaller output:
path — descriptionIdeal when adding multiple repo indexes to project knowledge.
testing
Disciplined, validation-gated revision of an EXISTING skill so each edit is a measured improvement rather than a guess. Use when editing, revising, or tuning a skill that already exists and there is evidence it underperforms (observed failures, drift, complaints) — invoke by name, or have versioning-skills / creating-skill defer to it before applying edits. Not for authoring a brand-new skill from scratch (use creating-skill) or one-off prose.
development
Skill-aware orchestration with context routing. Decomposes complex tasks into skill-typed subtasks, extracts targeted context subsets, executes subagents in parallel, and synthesizes results. Self-answers trivial lookups inline. No SDK dependency — uses raw HTTP via httpx. Use when tasks require multiple analytical perspectives, when context is large and subtasks only need portions, or when orchestrating-agents spawns too many redundant subagents.
tools
Orchestrates parallel API instances, delegated sub-tasks, and multi-agent workflows with streaming and tool-enabled delegation patterns. Use for parallel analysis, multi-perspective reviews, or complex task decomposition.
development
Invokes Google Gemini models for structured outputs, image generation, multi-modal tasks, and Google-specific features. Use when users request Gemini, image generation, structured JSON output, Google API integration, or cost-effective parallel processing.