.claude/skills/ingest/SKILL.md
Ingest a source into an LLM Wiki vault. Detects source type (URL, PDF, YouTube, tweet, gist, text), extracts content, creates wiki pages, updates index and log. Use when user wants to ingest, add, import, or process a source into their wiki.
npx skillsauth add RonanCodes/llm-wiki ingestInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Process any source into wiki pages. This is the main entry point for adding knowledge to a vault.
/ingest https://some-article.com --vault my-research
/ingest raw/paper.pdf --vault my-research
/ingest https://youtube.com/watch?v=abc123 --vault my-research
/ingest https://x.com/user/status/123 --vault my-research
/ingest https://gist.github.com/user/abc123 --vault my-research
/ingest https://github.com/owner/repo/discussions/123 --vault my-research
/ingest "Some pasted text or notes" --vault my-research
$ARGUMENTS contains the source and optional flags--vault <name> flag--vault specified, check if only one vault exists and use it. Otherwise ask.Identify the source type and delegate to the appropriate extraction method:
| Pattern | Source Type | Extraction Method |
|---------|-----------|-------------------|
| https://x.com/... or https://twitter.com/... | Tweet | Use FXTwitter API: curl -s "https://api.fxtwitter.com/{user}/status/{id}" — parse JSON for tweet.text, tweet.author, tweet.created_at. See ingest-tweet skill. |
| https://youtube.com/... or https://youtu.be/... | YouTube | Extract transcript via yt-dlp subtitles. Check which yt-dlp first, install with brew install yt-dlp if missing. See ingest-youtube skill. |
| https://gist.github.com/... | Gist | Fetch raw content: curl -sL "https://gist.githubusercontent.com/{user}/{id}/raw/". See ingest-gist skill. |
| https://news.ycombinator.com/item?id=... | Hacker News | Fetch thread via Algolia API: curl -s "https://hn.algolia.com/api/v1/items/{id}". See ingest-hackernews skill. |
| https://www.reddit.com/r/.../comments/... or https://old.reddit.com/... | Reddit | Append .json to URL, parse post + comment tree. See ingest-reddit skill. |
| https://github.com/.../discussions/... | GitHub Discussion | Fetch via gh api graphql. See ingest-github-discussions skill. |
| https://linkedin.com/... or https://www.linkedin.com/... | LinkedIn | Extract post content via cookie auth or pasted text. See ingest-linkedin skill. |
| https://dev.to/... | Dev.to Article | Fetch via dev.to public API: curl -s "https://dev.to/api/articles/{username}/{slug}" — returns body_markdown, title, tags, user, published_at. See ingest-devto skill. |
| https://... (other URLs) | Web Article | Fetch with curl -sL, extract readable content. For better extraction, use @mozilla/readability if available. See ingest-web skill. |
| *.mp4, *.mkv, *.mov, *.avi, *.webm (file path) | Video | Transcribe audio, extract keyframes. Requires FFmpeg + whisper/API key. See ingest-video skill. |
| *.pdf (file path) | PDF | Extract text with pdftotext. Check which pdftotext first, install with brew install poppler if missing. See ingest-pdf skill. |
| *.docx, *.xlsx, *.pptx (file path) | Office | Convert with pandoc. Check which pandoc first, install with brew install pandoc if missing. See ingest-office skill. |
| *.md (file path) | Markdown | Read the file directly. |
| Everything else | Text | Treat as pasted text/notes. |
Save the source material to the vault's raw/ directory:
VAULT="vaults/<vault-name>"
$VAULT/raw/<descriptive-name>.md. Download referenced images to $VAULT/raw/assets/ and replace remote URLs with local paths (see ingest-web skill for image handling details).$VAULT/raw/ if not already there$VAULT/raw/<topic-slug>-notes.md$VAULT/raw/assets/ with descriptive filenames. This lets the LLM view images directly for additional context.Use descriptive, kebab-case filenames: karpathy-llm-wiki-gist.md, react-server-components-paper.pdf
Read the vault's CLAUDE.md to get:
Read wiki-templates skill (at .claude/skills/wiki-templates/SKILL.md) for page type definitions and frontmatter requirements.
Based on the extracted content, create or update pages following the wiki-templates:
Create wiki/sources/<source-name>.md with:
For each notable person, organization, tool, or framework mentioned:
wiki/entities/<entity-name>.md existsFor each key idea, pattern, or technique discussed:
wiki/concepts/<concept-name>.md existsIf the source compares approaches, tools, or ideas:
wiki/comparisons/Important: Don't over-create pages. Only create entity/concept pages for things that are meaningfully discussed in the source, not every passing mention.
Add new entries to wiki/index.md in the appropriate table:
| [[source-name]] | One-line summary | domain-tag | YYYY-MM-DD |
For entity and concept pages, add to their respective tables too.
Append to log.md:
## [YYYY-MM-DD] ingest | <Source Title>
- Source type: <type>
- Raw file: raw/<filename>
- Pages created: [[source-name]], [[entity-1]], [[concept-1]]
- Pages updated: [[existing-entity-2]]
---
cd $VAULT
git add .
git commit -m "✨ feat: ingest <source-title>"
Show the user:
data-ai
Extract transcript from a YouTube video as clean readable text. Use when user shares a youtube.com or youtu.be link and wants the transcript, content summary, or to read what was said.
development
Page type templates and frontmatter conventions for LLM Wiki pages. Reference skill loaded by ingest, query, and lint skills to ensure consistent wiki structure.
testing
Show status of all LLM Wiki vaults — page counts, source counts, last activity, and git status. Use when user wants to see vault status, list vaults, or check wiki health.
documentation
Import an existing Obsidian vault, markdown folder, or git repo as an llm-wiki vault. Moves content into vaults/, adds missing structure (index, log, CLAUDE.md, frontmatter). Use when user wants to import, adopt, migrate, or bring in an existing knowledge base.