Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

kostja94/robots-txt

Name: robots-txt
Author: kostja94

skills/seo/technical/robots/SKILL.md

npx skillsauth add kostja94/marketing-skills robots-txt

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

SEO Technical: robots.txt

Guides configuration and auditing of robots.txt for search engine and AI crawler control.

When invoking: On first use, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On subsequent use or when the user asks to skip, go directly to the main output.

Scope (Technical SEO)

Robots.txt: Configure Disallow/Allow, Sitemap, Clean-param; audit for accidental blocks
Crawler access: Path-level crawl control; AI crawler allow/block strategy
Differentiation: robots.txt = crawl control (who accesses what paths); noindex = index control (what gets indexed). See indexing for page-level exclusions.

Initial Assessment

Check for project context first: If .claude/project-context.md or .cursor/project-context.md exists, read it for site URL and indexing goals.

Identify:

Site URL: Base domain (e.g., https://example.com)
Indexing scope: Full site, partial, or specific paths to exclude
AI crawler strategy: Allow search/indexing vs. block training data crawlers

Best Practices

Purpose and Limitations

| Point | Note | |-------|------| | Purpose | Controls crawler access; does NOT prevent indexing (disallowed URLs may still appear in search without snippet) | | Advisory | Rules are advisory; malicious crawlers may ignore | | Public | robots.txt is publicly readable; use noindex or auth for sensitive content. See indexing |

Crawl vs Index vs Link Equity (Quick Reference)

| Tool | Controls | Prevents indexing? | |------|----------|-------------------| | robots.txt | Crawl (path-level) | No—blocked URLs may still appear in SERP | | noindex (meta / X-Robots-Tag) | Index (page-level) | Yes. See indexing | | nofollow | Link equity only | No—does not control indexing |

When to Use robots.txt vs noindex

| Use | Tool | Example | |-----|------|---------| | Path-level (whole directory) | robots.txt | Disallow: /admin/, Disallow: /api/, Disallow: /staging/ | | Page-level (specific pages) | noindex meta / X-Robots-Tag | Login, signup, thank-you, 404, legal. See indexing for full list | | Critical | Do NOT block in robots.txt | Pages that use noindex—crawlers must access the page to read the directive |

Paths to block in robots.txt: /admin/, /api/, /staging/, temp files. Paths to use noindex (allow crawl): /login/, /signup/, /thank-you/, etc.—see indexing.

Location and Format

| Item | Requirement | |------|-------------| | Path | Site root: https://example.com/robots.txt | | Encoding | UTF-8 plain text | | Standard | RFC 9309 (Robots Exclusion Protocol) |

Core Directives

| Directive | Purpose | Example | |-----------|---------|---------| | User-agent: | Target crawler | User-agent: Googlebot, User-agent: * | | Disallow: | Block path prefix | Disallow: /admin/ | | Allow: | Allow path (can override Disallow) | Allow: /public/ | | Sitemap: | Declare sitemap absolute URL | Sitemap: https://example.com/sitemap.xml | | Clean-param: | Strip query params (Yandex) | See below |

Path Format Rules (Critical)

Different directives use different path formats — a common source of errors:

| Directive | Format | Correct | Wrong | |-----------|--------|---------|-------| | Disallow / Allow | Root-relative path only (starts with /) | Disallow: /admin/ | Disallow: https://example.com/admin/ | | Sitemap | Absolute URL only | Sitemap: https://example.com/sitemap.xml | Sitemap: /sitemap.xml |

Wildcards: * matches any character sequence (Disallow: /tmp/*). $ marks exact URL ending (Allow: /news/.html$).

Priority: More specific paths take precedence. Allow: /shop/shoes/ overrides Disallow: /shop/. Path matching is case-sensitive: Disallow: /PDF/ does not match /pdf/.

Critical: Do Not Block

| Do not block | Reason | |--------------|--------| | CSS, JS, images | Google needs them to render pages; blocking breaks indexing | | /_next/ (Next.js) | Breaks CSS/JS loading; static assets in GSC "Crawled - not indexed" is expected. See indexing | | Pages that use noindex | Crawlers must access the page to read the noindex directive; blocking in robots.txt prevents that |

Only block: paths that don't need crawling: /admin/, /api/, /staging/, temp files.

AI Crawler Strategy

robots.txt is effective for all measured AI crawlers. Set rules per user-agent; check each vendor's docs for current tokens.

| User-agent | Purpose | Typical | Notes | |------------|---------|---------|-------| | OAI-SearchBot | ChatGPT search | Allow | Respects robots.txt | | GPTBot | OpenAI training | Disallow | Respects robots.txt; shares crawl data with OAI-SearchBot if both allowed | | ChatGPT-User | User-initiated browsing | N/A | No longer respects robots.txt (Dec 2025); use server-side controls instead | | Claude-SearchBot | Claude search | Allow | Respects robots.txt | | Claude-User | Anthropic user-initiated browsing | Allow | Respects robots.txt (unlike ChatGPT-User) | | ClaudeBot | Anthropic training | Disallow | Respects robots.txt |

Deprecated: ~~anthropic-ai~~ — retired by Anthropic, replaced by ClaudeBot / Claude-User / Claude-SearchBot. References to anthropic-ai in robots.txt have no effect. | PerplexityBot | Perplexity search | Allow | Respects robots.txt | | Google-Extended | Gemini training | Disallow | Respects robots.txt | | CCBot | Common Crawl (LLM training) | Disallow | Respects robots.txt | | Bytespider | ByteDance | Disallow | Respects robots.txt | | Meta-ExternalAgent | Meta | Disallow | Respects robots.txt | | AppleBot | Apple (Siri, Spotlight); renders JS | Allow for indexing | Respects robots.txt |

Allow vs Disallow: Allow search/indexing bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot); Disallow training-only bots (GPTBot, ClaudeBot, CCBot) if you don't want content used for model training.

Important — ChatGPT-User exemption: As of December 2025, ChatGPT-User no longer respects robots.txt directives. OpenAI considers it a proxy for human-initiated browsing. If you need to block it, use server-side controls (WAF rules, IP rate-limiting), not robots.txt. See site-crawlability for AI crawler optimization (SSR, URL management).

Clean-param (Yandex)

Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid

Output Format

Current state (if auditing)
Recommended robots.txt (full file)
Compliance checklist
References: Google robots.txt

Related Skills

indexing: Full noindex page-type list; when to use noindex vs robots.txt; GSC indexing diagnosis
page-metadata: Meta robots (noindex, nofollow) implementation
xml-sitemap: Sitemap URL to reference in robots.txt
site-crawlability: Broader crawl and structure guidance; AI crawler optimization
rendering-strategies: SSR, SSG, CSR; content in initial HTML for crawlers

kostja94/robots-txt

skills/seo/technical/robots/SKILL.md

When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawlers," "GPTBot," "allow/disallow," "disallow path," "crawl directives," "user-agent," "block Googlebot," "fix robots.txt," "robots.txt blocking," or "search engine crawling." For indexing, use indexing.

588 stars

development

Updated Jun 9, 2026

$ install --global

skillsauth

npx skillsauth add kostja94/marketing-skills robots-txt

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 9, 2026, 8:11 AM137.9s1 file scanned

SKILL.md

name:: robots-txt
description:: When the user wants to configure, audit, or optimize robots.txt. Also use when the user mentions "robots.txt," "crawler rules," "block crawlers," "AI crawlers," "GPTBot," "allow/disallow," "disallow path," "crawl directives," "user-agent," "block Googlebot," "fix robots.txt," "robots.txt blocking," or "search engine crawling." For indexing, use indexing.
version:: 1.2.0

SEO Technical: robots.txt

Guides configuration and auditing of robots.txt for search engine and AI crawler control.

Scope (Technical SEO)

Robots.txt: Configure Disallow/Allow, Sitemap, Clean-param; audit for accidental blocks
Crawler access: Path-level crawl control; AI crawler allow/block strategy
Differentiation: robots.txt = crawl control (who accesses what paths); noindex = index control (what gets indexed). See indexing for page-level exclusions.

Initial Assessment

Check for project context first: If .claude/project-context.md or .cursor/project-context.md exists, read it for site URL and indexing goals.

Identify:

Site URL: Base domain (e.g., https://example.com)
Indexing scope: Full site, partial, or specific paths to exclude
AI crawler strategy: Allow search/indexing vs. block training data crawlers

Best Practices

Purpose and Limitations

Crawl vs Index vs Link Equity (Quick Reference)

When to Use robots.txt vs noindex

Paths to block in robots.txt: /admin/, /api/, /staging/, temp files. Paths to use noindex (allow crawl): /login/, /signup/, /thank-you/, etc.—see indexing.

Location and Format

| Item | Requirement | |------|-------------| | Path | Site root: https://example.com/robots.txt | | Encoding | UTF-8 plain text | | Standard | RFC 9309 (Robots Exclusion Protocol) |

Core Directives

Path Format Rules (Critical)

Different directives use different path formats — a common source of errors:

Wildcards: * matches any character sequence (Disallow: /tmp/*). $ marks exact URL ending (Allow: /news/.html$).

Priority: More specific paths take precedence. Allow: /shop/shoes/ overrides Disallow: /shop/. Path matching is case-sensitive: Disallow: /PDF/ does not match /pdf/.

Critical: Do Not Block

Only block: paths that don't need crawling: /admin/, /api/, /staging/, temp files.

AI Crawler Strategy

robots.txt is effective for all measured AI crawlers. Set rules per user-agent; check each vendor's docs for current tokens.

Deprecated: ~~anthropic-ai~~ — retired by Anthropic, replaced by ClaudeBot / Claude-User / Claude-SearchBot. References to anthropic-ai in robots.txt have no effect. | PerplexityBot | Perplexity search | Allow | Respects robots.txt | | Google-Extended | Gemini training | Disallow | Respects robots.txt | | CCBot | Common Crawl (LLM training) | Disallow | Respects robots.txt | | Bytespider | ByteDance | Disallow | Respects robots.txt | | Meta-ExternalAgent | Meta | Disallow | Respects robots.txt | | AppleBot | Apple (Siri, Spotlight); renders JS | Allow for indexing | Respects robots.txt |

Clean-param (Yandex)

Clean-param: utm_source&utm_medium&utm_campaign&utm_term&utm_content&ref&fbclid&gclid

Output Format

Current state (if auditing)
Recommended robots.txt (full file)
Compliance checklist
References: Google robots.txt

Related Skills

indexing: Full noindex page-type list; when to use noindex vs robots.txt; GSC indexing diagnosis
page-metadata: Meta robots (noindex, nofollow) implementation
xml-sitemap: Sitemap URL to reference in robots.txt
site-crawlability: Broader crawl and structure guidance; AI crawler optimization
rendering-strategies: SSR, SSG, CSR; content in initial HTML for crawlers

Related Skills

kostja94/xml-sitemap

testing

VerifiedTrustedCommunity

When the user wants to create, audit, or optimize sitemap.xml. Also use when the user mentions "sitemap," "sitemap.xml," "sitemap index," "lastmod," "changefreq," "priority," "URL discovery," "URL discovery for search engines," "single source of truth," "URL config," "unify sitemap IndexNow," or "reduce duplicate maintenance." For IndexNow, use indexnow.

588SKILL.mdUpdated Apr 24, 2026

kostja94/programmatic-seo

testing

VerifiedTrustedCommunity

When the user wants to create SEO pages at scale using templates and data—including AI-assisted, grounded copy for per-URL differentiation (vs rigid mail-merge templates). Also use when the user mentions "programmatic SEO," "programmatic SEO pages," "template pages," "scale content," "location pages," "city pages," "comparison pages at scale," "X vs Y pages," "integration pages," "pages from data," "automated landing pages," or "programmatic landing pages." Uses a playbook matrix aligned to skills under skills/pages. For user-facing template galleries or marketplaces (browse → use), use template-page-generator.

588SKILL.mdUpdated Apr 24, 2026

kostja94/programmatic-seo

kostja94/twitter-cards

data-ai

VerifiedTrustedCommunity

When the user wants to add or optimize Twitter Card metadata for X (Twitter) link previews. Also use when the user mentions "Twitter Card," "twitter:card," "twitter:image," "twitter:title," "X preview," or "tweet preview." For Facebook/LinkedIn previews, use open-graph.

571SKILL.mdUpdated Apr 24, 2026

kostja94/twitter-cards

kostja94/open-graph

testing

VerifiedTrustedCommunity

When the user wants to add or optimize Open Graph metadata for social sharing. Also use when the user mentions "Open Graph," "og:tags," "og:title," "og:image," "og:description," "Facebook preview," "LinkedIn preview," or "social share preview." For X (Twitter) link previews, use twitter-cards. For SERP title/description, use title-tag and meta-description.

571SKILL.mdUpdated Apr 24, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/kostja94/marketing-skills.git

# Copy into Claude Code skills folder (global)
cp -r marketing-skills/skills/seo/technical/robots ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

kostja94/marketing-skills

588 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT