skills/crawlers/SKILL.md
AI crawler access analysis for blogs and websites. Checks robots.txt, meta tags, and HTTP headers to determine which AI crawlers can access the site. Provides a complete access map and recommendations for maximizing AI visibility. Use when user says "AI crawlers", "robots.txt", "AI access", "block AI crawlers", "allow AI bots", "GPTBot", "ClaudeBot", "PerplexityBot", "AI crawler audit", or "can AI read my blog".
npx skillsauth add rainday/smart-blog-skills crawlersInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
分析部落格和網站的 AI 爬蟲可訪問性。若 AI 爬蟲被封鎖,無論內容品質多高,都無法出現在 AI 生成的回應中。
2026 年初的 Originality.ai 研究發現,前 1,000 名網站中:
許多網站透過過於激進的 robots.txt(繼承自舊版 SEO 設定)意外封鎖 AI 爬蟲。封鎖 AI 爬蟲是使網站在 AI 搜尋結果中消失最快的方法。
| 爬蟲 | User-Agent | 運營者 | 目的 | 建議 |
|------|-----------|--------|------|------|
| GPTBot | GPTBot | OpenAI | ChatGPT 網頁瀏覽、搜尋 | ALLOW — 300M+ 週活用戶 |
| OAI-SearchBot | OAI-SearchBot | OpenAI | ChatGPT 搜尋功能(不用於訓練) | ALLOW — 搜尋專用,無訓練風險 |
| ChatGPT-User | ChatGPT-User | OpenAI | 用戶主動請求 ChatGPT 訪問特定 URL | ALLOW — 阻止用戶從 ChatGPT 讀取你的內容 |
| ClaudeBot | ClaudeBot | Anthropic | Claude 網頁搜尋和分析 | ALLOW — Claude 市占率持續增長 |
| PerplexityBot | PerplexityBot | Perplexity AI | Perplexity AI 搜尋(含來源引用) | ALLOW — 最佳 AI 搜尋引薦流量來源 |
| 爬蟲 | User-Agent | 運營者 | 備註 |
|------|-----------|--------|------|
| Google-Extended | Google-Extended | Google | Gemini 訓練和 AI Overview 改善(不影響標準搜尋排名) |
| GoogleOther | GoogleOther | Google | Google 研究和 AI 相關資料收集 |
| Applebot-Extended | Applebot-Extended | Apple | Apple Intelligence 功能(2B+ 裝置) |
| Amazonbot | Amazonbot | Amazon | Alexa 和 Amazon AI |
| FacebookBot | FacebookBot | Meta | Meta AI(3B+ 合併用戶) |
| 爬蟲 | User-Agent | 建議 |
|------|-----------|------|
| CCBot | CCBot | 依情況 — 訓練數據,不影響 AI 搜尋能見度 |
| anthropic-ai | anthropic-ai | 依情況 — Claude 模型訓練(非即時功能) |
| Bytespider | Bytespider | BLOCK — 爬蟲行為激進,西方市場效益低 |
| cohere-ai | cohere-ai | 依情況 — Cohere 模型訓練 |
WebFetch [domain]/robots.txt,解析所有 User-agent 指令:
對每個 AI 爬蟲確認:
User-agent: * 萬用字元規則?Crawl-delay(可能減慢 AI 爬蟲)?Sitemap 指令(AI 爬蟲用於發現內容)?WebFetch 5-10 個關鍵頁面,查詢:
<meta name="robots" content="noindex"> — 封鎖所有機器人<meta name="robots" content="noai"> — 封鎖 AI 使用(新興標籤)<meta name="robots" content="noimageai"> — 封鎖 AI 圖片訓練<meta name="GPTBot" content="noindex">同樣頁面查詢 response headers:
X-Robots-Tag: noindex — HTTP header 版 noindexX-Robots-Tag: noai — 封鎖 AI 使用X-Robots-Tag: noimageai — 封鎖 AI 圖片訓練/llms.txt — AI 爬蟲導引(見 /smart-blog-skills:llmstxt)/.well-known/ai-plugin.json — OpenAI plugin manifest/ai.txt — 提案中的標準檢查網站是否需要 JS 渲染:
在 robots.txt 掃描 Content-Signal: 指令(IETF 草案):
ai-train, search, ai-personalization, ai-retrievalyes, noAI Crawler Access Score (0-100):
| 組成 | 權重 | 評分方式 |
|------|------|---------|
| Tier 1 爬蟲允許數 | 50% | 每允許一個 Tier 1 = 10 分(共 5 個) |
| Tier 2 爬蟲允許數 | 25% | 每允許一個 Tier 2 = 5 分(共 5 個) |
| 無全面封鎖 | 15% | 無 User-agent: * Disallow: / + 無 noai meta |
| AI 專屬檔案 | 10% | llms.txt = 5 分,sitemap 可訪問 = 5 分 |
生成 GEO-CRAWLER-ACCESS.md:
# AI Crawler Access 報告:[Domain]
**分析日期:** [Date]
**robots.txt 狀態:** Found / Not Found
## Crawler Access 總覽
| 爬蟲 | 運營者 | Tier | 狀態 | 影響 |
|------|--------|------|------|------|
| GPTBot | OpenAI | 1 | Allowed/Blocked | [說明] |
| OAI-SearchBot | OpenAI | 1 | ... | |
| ChatGPT-User | OpenAI | 1 | ... | |
| ClaudeBot | Anthropic | 1 | ... | |
| PerplexityBot | Perplexity | 1 | ... | |
| Google-Extended | Google | 2 | ... | |
| ... | | | | |
## AI 能見度分數:[X]/100
Tier 1 訪問:[X/5 爬蟲允許]
Tier 2 訪問:[X/5 爬蟲允許]
## 嚴重問題
[列出被封鎖的 Tier 1 爬蟲]
## 建議
### 立即行動
[具體的 robots.txt 修改]
### 建議 robots.txt(AI 爬蟲部分)
[完整建議的 robots.txt 內容]
### 技術發現
- Meta Robots 標籤:[發現]
- X-Robots-Tag:[發現]
- JavaScript 渲染:[評估]
- llms.txt:[存在/不存在]
- Sitemap 可訪問性:[評估]
### Content Signals
[存在/不存在,若存在顯示解析結果]
# AI 爬蟲 — 允許(AI 搜尋能見度)
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: GoogleOther
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: FacebookBot
Allow: /
# AI 爬蟲 — 封鎖(激進/低效益)
User-agent: Bytespider
Disallow: /
/smart-blog-skills:llmstxt/smart-blog-skills:geo/smart-blog-skills:brand-mentionsdocumentation
Smart Blog 寫文章。從零寫一篇新的部落格文章,包含模板選擇、研究、 YouTube 影片嵌入、Humanizer 反 AI 審稿、品質檢查。 內建反幻覺驗證,繁體中文優先。 Use when user says "write blog", "寫文章", "寫部落格", "new blog post", "smart-blog write", "blog write".
testing
Translate existing blog posts into one or more target languages with SEO-optimized localization. Produces native-quality translations that preserve markdown structure, frontmatter, schema JSON-LD, image and chart embeds, and citation capsules. Localizes keywords, meta tags, numbers, dates, currencies, and quote styles per locale. Flags machine-translation artifacts for review. Run BEFORE localize: this handles language conversion; localize handles cultural adaptation after translation completes. Use when user says "translate blog", "blog translate", "uebersetzen", "traduire", "traducir", "translate post", "blog auf Deutsch", "blog en espanol".
development
Extract, suggest, and sync tags and categories for blog posts across all major CMS platforms. Supports WordPress REST API, Shopify GraphQL, Ghost Content API, Strapi REST/GraphQL, and Sanity GROQ. Generates tag suggestions from content analysis (keyword frequency, heading extraction, semantic grouping), enforces minimum post-count thresholds to prevent thin tag archives, and syncs taxonomy via authenticated API calls. Use when user says "tags", "categories", "taxonomy", "tag suggestions", "sync tags", "WordPress tags", "Shopify tags".
development
Blog strategy development including topic cluster architecture with hub-and-spoke design, audience mapping, competitive landscape analysis, AI citation surface strategy across ChatGPT/Perplexity/AI Overviews, distribution channel planning (YouTube, Reddit, review platforms for GEO), content scoring targets, measurement framework, and content differentiation through original research and first-hand experience. Use when user says "blog strategy", "content strategy", "blog positioning", "what should I blog about", "blog topics", "content pillars", "blog ideation".