agentic/code/addons/doc-intelligence/skills/llms-txt-support/SKILL.md
Detect and use llms.txt files for LLM-optimized documentation. Use when checking if a site has LLM-ready docs before scraping.
npx skillsauth add jmagly/aiwg llms-txt-supportInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Single responsibility: Detect, fetch, and utilize llms.txt files that provide LLM-optimized documentation, enabling 10x faster documentation ingestion. (BP-4)
The llms.txt standard (https://llmstxt.org/) provides a convention for websites to expose LLM-friendly documentation. Instead of scraping entire sites, check for llms.txt first.
File hierarchy (check in order):
llms-full.txt - Complete documentation (largest)llms.txt - Standard documentationllms-small.txt - Condensed documentation (smallest)Before executing, VERIFY:
DO NOT assume llms.txt exists. Always probe first.
ASK USER instead of guessing when:
NEVER assume llms.txt quality without verification.
| Context Type | Included | Excluded | |--------------|----------|----------| | RELEVANT | Target base URL, llms.txt content | Full site scraping | | PERIPHERAL | llms.txt spec reference | Other sites' llms.txt | | DISTRACTOR | Previous scraping attempts | Unrelated documentation |
# Check for llms.txt variants (in order of preference)
curl -I https://example.com/llms-full.txt
curl -I https://example.com/llms.txt
curl -I https://example.com/llms-small.txt
# Check common alternate locations
curl -I https://example.com/.well-known/llms.txt
curl -I https://docs.example.com/llms.txt
# Fetch and inspect first 100 lines
curl -s https://example.com/llms.txt | head -100
# Check file size
curl -sI https://example.com/llms.txt | grep -i content-length
# Verify it's not an error page
curl -s https://example.com/llms.txt | grep -i "not found\|error\|404" && echo "WARNING: May be error page"
| Variant | Size | Use Case |
|---------|------|----------|
| llms-full.txt | Large (1MB+) | Complete documentation, full API reference |
| llms.txt | Medium | Standard use, balanced coverage |
| llms-small.txt | Small (<100KB) | Quick reference, limited context windows |
Decision tree:
llms-small.txtllms-full.txtllms.txt# Download llms.txt
curl -o docs/llms.txt https://example.com/llms.txt
# Convert to skill format (if using skill-seekers)
skill-seekers scrape --llms-txt docs/llms.txt --name myskill
# Or process manually
# llms.txt is already LLM-optimized markdown
cp docs/llms.txt output/myskill/references/complete.md
# Check content structure
head -50 output/myskill/references/complete.md
# Verify sections
grep "^#" output/myskill/references/complete.md | head -20
# Check for code examples
grep -c '```' output/myskill/references/complete.md
On error:
404 Not Found → Try next variant or alternate location403 Forbidden → May need authentication or user-agentTimeout → Retry with longer timeoutInvalid content → Fall back to traditional scrapingState saved to: .aiwg/working/checkpoints/llms-txt-support/
checkpoints/llms-txt-support/
├── detection_results.json # Which variants found
├── selected_variant.txt # Which was chosen
└── content_hash.txt # For cache validation
Standard llms.txt structure:
# Project Name
> Brief description of the project
## Overview
[High-level explanation]
## Installation
[Setup instructions]
## Quick Start
[Getting started guide]
## API Reference
[Detailed API documentation]
## Examples
[Code examples]
## FAQ
[Common questions]
{
"base_url": "https://example.com",
"detected": {
"llms-full.txt": {
"found": true,
"url": "https://example.com/llms-full.txt",
"size": 1523456,
"last_modified": "2025-01-15T10:30:00Z"
},
"llms.txt": {
"found": true,
"url": "https://example.com/llms.txt",
"size": 245678,
"last_modified": "2025-01-15T10:30:00Z"
},
"llms-small.txt": {
"found": false
}
},
"recommended": "llms.txt",
"reason": "Standard size, good for most use cases"
}
Sites known to support llms.txt (verify before use):
Always verify - this list may be outdated.
| Issue | Diagnosis | Solution | |-------|-----------|----------| | No llms.txt found | Site doesn't support | Fall back to doc-scraper | | Content seems wrong | Error page or redirect | Check actual content, verify URL | | File too large | llms-full.txt overwhelming | Use llms.txt or llms-small.txt | | Outdated content | llms.txt not maintained | Consider scraping + llms.txt merge |
If llms.txt is incomplete or outdated, combine approaches:
# 1. Fetch llms.txt as base
curl -o base.md https://example.com/llms.txt
# 2. Scrape for additional/updated content
skill-seekers scrape --config config.json --skip-covered-by base.md
# 3. Merge results
# llms.txt provides structure, scraping fills gaps
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.