skills/seo/technical/robots/SKILL.md
When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawlers," "GPTBot," "allow/disallow," "disallow path," "crawl directives," "user-agent," "block Googlebot," "fix robots.txt," "robots.txt blocking," or "search engine crawling." For indexing, use indexing.
npx skillsauth add kostja94/marketing-skills robots-txtInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guides configuration and auditing of robots.txt for search engine and AI crawler control.
When invoking: On first use, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On subsequent use or when the user asks to skip, go directly to the main output.
Check for project context first: If .claude/project-context.md or .cursor/project-context.md exists, read it for site URL and indexing goals.
Identify:
https://example.com)| Point | Note | |-------|------| | Purpose | Controls crawler access; does NOT prevent indexing (disallowed URLs may still appear in search without snippet) | | Advisory | Rules are advisory; malicious crawlers may ignore | | Public | robots.txt is publicly readable; use noindex or auth for sensitive content. See indexing |
| Tool | Controls | Prevents indexing? | |------|----------|-------------------| | robots.txt | Crawl (path-level) | No—blocked URLs may still appear in SERP | | noindex (meta / X-Robots-Tag) | Index (page-level) | Yes. See indexing | | nofollow | Link equity only | No—does not control indexing |
| Use | Tool | Example |
|-----|------|---------|
| Path-level (whole directory) | robots.txt | Disallow: /admin/, Disallow: /api/, Disallow: /staging/ |
| Page-level (specific pages) | noindex meta / X-Robots-Tag | Login, signup, thank-you, 404, legal. See indexing for full list |
| Critical | Do NOT block in robots.txt | Pages that use noindex—crawlers must access the page to read the directive |
Paths to block in robots.txt: /admin/, /api/, /staging/, temp files. Paths to use noindex (allow crawl): /login/, /signup/, /thank-you/, etc.—see indexing.
| Item | Requirement |
|------|-------------|
| Path | Site root: https://example.com/robots.txt |
| Encoding | UTF-8 plain text |
| Standard | RFC 9309 (Robots Exclusion Protocol) |
| Directive | Purpose | Example |
|-----------|---------|---------|
| User-agent: | Target crawler | User-agent: Googlebot, User-agent: * |
| Disallow: | Block path prefix | Disallow: /admin/ |
| Allow: | Allow path (can override Disallow) | Allow: /public/ |
| Sitemap: | Declare sitemap absolute URL | Sitemap: https://example.com/sitemap.xml |
| Clean-param: | Strip query params (Yandex) | See below |
| Do not block | Reason |
|--------------|--------|
| CSS, JS, images | Google needs them to render pages; blocking breaks indexing |
| /_next/ (Next.js) | Breaks CSS/JS loading; static assets in GSC "Crawled - not indexed" is expected. See indexing |
| Pages that use noindex | Crawlers must access the page to read the noindex directive; blocking in robots.txt prevents that |
Only block: paths that don't need crawling: /admin/, /api/, /staging/, temp files.
robots.txt is effective for all measured AI crawlers. Set rules per user-agent; check each vendor's docs for current tokens.
| User-agent | Purpose | Typical | Notes | |------------|---------|---------|-------| | OAI-SearchBot | ChatGPT search | Allow | Respects robots.txt | | GPTBot | OpenAI training | Disallow | Respects robots.txt; shares crawl data with OAI-SearchBot if both allowed | | ChatGPT-User | User-initiated browsing | N/A | No longer respects robots.txt (Dec 2025); use server-side controls instead | | Claude-SearchBot | Claude search | Allow | Respects robots.txt | | ClaudeBot | Anthropic training | Disallow | Respects robots.txt | | PerplexityBot | Perplexity search | Allow | Respects robots.txt | | Google-Extended | Gemini training | Disallow | Respects robots.txt | | CCBot | Common Crawl (LLM training) | Disallow | Respects robots.txt | | Bytespider | ByteDance | Disallow | Respects robots.txt | | Meta-ExternalAgent | Meta | Disallow | Respects robots.txt | | AppleBot | Apple (Siri, Spotlight); renders JS | Allow for indexing | Respects robots.txt |
Allow vs Disallow: Allow search/indexing bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot); Disallow training-only bots (GPTBot, ClaudeBot, CCBot) if you don't want content used for model training.
Important — ChatGPT-User exemption: As of December 2025, ChatGPT-User no longer respects robots.txt directives. OpenAI considers it a proxy for human-initiated browsing. If you need to block it, use server-side controls (WAF rules, IP rate-limiting), not robots.txt. See site-crawlability for AI crawler optimization (SSR, URL management).
Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid
data-ai
When the user wants to add or optimize Twitter Card metadata for X (Twitter) link previews. Also use when the user mentions "Twitter Card," "twitter:card," "twitter:image," "twitter:title," "X preview," or "tweet preview." For Facebook/LinkedIn previews, use open-graph.
testing
When the user wants to add or optimize Open Graph metadata for social sharing. Also use when the user mentions "Open Graph," "og:tags," "og:title," "og:image," "og:description," "Facebook preview," "LinkedIn preview," or "social share preview." For X (Twitter) link previews, use twitter-cards. For SERP title/description, use title-tag and meta-description.
tools
When the user wants to create, optimize, or structure Terms of Service page. Also use when the user mentions "terms of service," "terms and conditions," "terms of use," "user agreement," "ToS," "legal terms," "service agreement," or "terms page." For legal overview page, use legal-page-generator.
development
When the user wants to create or optimize a shipping or delivery information page. Also use when the user mentions "shipping," "delivery," "shipping policy," "delivery times," "shipping page," "free shipping," "shipping rates," "delivery options," "shipping info," "cross-border shipping," "international delivery," or "order tracking." For legal overview, use legal-page-generator.