.claude/skills/notion-ingest/SKILL.md
# /notion-ingest — Notion Workspace Crawler > Crawls a Notion workspace to extract business terms, metrics, product docs, > and team structure. Populates the organization knowledge system. ## Trigger Invoked as `/notion-ingest` or `/notion-ingest {workspace_url}` ## Prerequisites - Notion integration token configured in `.knowledge/user/integrations.yaml` - Organization directory exists at `.knowledge/organizations/{org}/` - If no token: "Notion integration token not found. Add it to `.knowle
npx skillsauth add ai-analyst-lab/ai-analyst .claude/skills/notion-ingestInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Crawls a Notion workspace to extract business terms, metrics, product docs, and team structure. Populates the organization knowledge system.
Invoked as /notion-ingest or /notion-ingest {workspace_url}
.knowledge/user/integrations.yaml.knowledge/organizations/{org}/.knowledge/user/integrations.yaml under notion.token. See Notion Integration Guide for setup."This skill uses a breadth-first crawl strategy to systematically traverse a Notion workspace, converting pages to structured knowledge entries. It does NOT require external Python packages — all Notion API calls use inline HTTP requests.
import yaml, os
# Load integration config
integrations_path = ".knowledge/user/integrations.yaml"
with open(integrations_path) as f:
config = yaml.safe_load(f)
notion_token = config.get("notion", {}).get("token")
if not notion_token:
print("❌ No Notion token found. Add to .knowledge/user/integrations.yaml")
# HALT
Verify token works with a simple API call:
GET https://api.notion.com/v1/users/me
Authorization: Bearer {token}
Notion-Version: 2022-06-28
Ask the user for crawl scope:
Notion workspace connected. How would you like to crawl?
1. **Full workspace** — Crawl all accessible pages (may be slow for large workspaces)
2. **Specific database** — Provide a database URL to crawl
3. **Specific page tree** — Provide a root page URL to crawl its children
4. **Search by keyword** — Search for pages matching specific terms
Algorithm: Breadth-First Search (BFS)
Queue ← [root_page_id]
Visited ← {}
Results ← []
WHILE Queue is not empty:
page_id ← Queue.dequeue()
IF page_id IN Visited: CONTINUE
Visited.add(page_id)
page ← fetch_page(page_id) # GET /v1/pages/{id}
children ← fetch_children(page_id) # GET /v1/blocks/{id}/children
result ← convert_to_knowledge(page, children)
Results.append(result)
# Enqueue child pages and linked databases
FOR child IN children:
IF child.type == "child_page" OR child.type == "child_database":
Queue.enqueue(child.id)
rate_limit_pause() # See Step 4
GET https://api.notion.com/v1/pages/{page_id}
Authorization: Bearer {token}
Notion-Version: 2022-06-28
GET https://api.notion.com/v1/blocks/{block_id}/children?page_size=100
Authorization: Bearer {token}
Notion-Version: 2022-06-28
Handle pagination via has_more and next_cursor.
Notion API limits: 3 requests per second for integration tokens.
import time
class RateLimiter:
"""Simple token-bucket rate limiter for Notion API."""
def __init__(self, requests_per_second=2.5):
self.min_interval = 1.0 / requests_per_second
self.last_request = 0
def wait(self):
elapsed = time.time() - self.last_request
if elapsed < self.min_interval:
time.sleep(self.min_interval - elapsed)
self.last_request = time.time()
Backoff strategy:
Retry-After header seconds, minimum 1sConvert Notion block types to markdown:
| Notion Block Type | Markdown Output |
|-------------------|-----------------|
| paragraph | Plain text |
| heading_1 | # Title |
| heading_2 | ## Title |
| heading_3 | ### Title |
| bulleted_list_item | - Item |
| numbered_list_item | 1. Item |
| code | ```lang\ncode\n``` |
| quote | > Quote |
| callout | > ℹ️ Callout |
| table | Markdown table |
| divider | --- |
| toggle | Treat as heading + nested content |
| child_page | [Page Title](notion://page_id) |
| child_database | [Database Title](notion://db_id) |
Rich text extraction:
**text***text*`text`[text](url)@{mention_name}For each crawled page, attempt to classify and extract structured knowledge:
| Page Contains | Classification | Target File |
|---------------|---------------|-------------|
| Term definitions, glossary entries | Glossary term | business/glossary/terms.yaml |
| KPI, metric, formula | Metric definition | business/metrics/index.yaml |
| Product name, feature list | Product entry | business/products/index.yaml |
| OKR, objective, key result | Objective | business/objectives/index.yaml |
| Team name, org chart | Team entry | business/teams/index.yaml |
| SQL query, data pattern | Query archaeology | .knowledge/query-archaeology/raw/ |
All crawled pages are saved as raw markdown to:
.knowledge/query-archaeology/raw/notion_{page_id_short}.md
With YAML frontmatter:
---
source: notion
page_id: {full_page_id}
title: {page_title}
url: {page_url}
crawled_at: {timestamp}
classification: {auto_class or "unclassified"}
---
During crawl, show progress:
🔄 Crawling Notion workspace...
Pages crawled: 45/~120 (estimated)
Terms extracted: 12
Metrics found: 5
Products found: 3
Errors: 1 (skipped)
Current: "Q4 2025 OKR Tracker"
After crawl completes:
✅ Notion ingest complete!
Pages crawled: 127
Pages skipped: 3 (errors logged)
Knowledge extracted:
Glossary terms: 23 → business/glossary/terms.yaml
Metrics: 8 → business/metrics/index.yaml
Products: 5 → business/products/index.yaml
Objectives: 12 → business/objectives/index.yaml
Teams: 4 → business/teams/index.yaml
Raw pages saved: 127 → .knowledge/query-archaeology/raw/
Review extracted knowledge with `/business` to verify accuracy.
Auto-classifications may need manual correction.
For pages containing SQL queries or data patterns, create cookbook entries:
from helpers.archaeology_helpers import capture_cookbook_entry
# For each page with SQL content
capture_cookbook_entry(
title=page_title,
sql=extracted_sql,
description=f"From Notion: {page_title}",
tags=["notion-import", classification],
source=f"notion:{page_id}"
)
| Error | Response |
|-------|----------|
| Invalid token | "Notion token is invalid or expired. Update in .knowledge/user/integrations.yaml." |
| Permission denied (403) | "Cannot access page '{title}'. Check integration permissions in Notion." |
| Rate limited (429) | Auto-retry with backoff (transparent to user) |
| Network error | Retry 3x, then skip page and continue |
| Empty workspace | "No accessible pages found. Verify the integration has access to your workspace." |
| Large workspace (500+) | "Large workspace detected (~{n} pages). This may take several minutes. Continue? [Y/n]" |
For subsequent runs, support incremental mode:
/notion-ingest --incremental
.knowledge/organizations/{org}/notion_sync_state.yaml for last sync timestampfilter.timestamp.last_edited_time.after parameter/notion-ingest reset — Clears all raw Notion pages and sync state. Does NOT remove
extracted knowledge entries (those must be cleaned via /business or manually).
testing
# Skill: {{BLANK_1_SKILL_NAME}} ## Purpose {{BLANK_2_WHEN_TO_FIRE}} ## When to Use Fires automatically when the user asks Claude to do something that matches the trigger condition above. ## Instructions 1. Detect the trigger condition 2. Execute your guardrail check 3. If the check matters, print a clear, visible warning with "{{BLANK_3_SIGNATURE_PHRASE}}" as the first line 4. Continue with the analysis, incorporating the warning into the output ## Anti-Patterns - Do not fire when the condit
development
# Skill: Visualization Patterns ## Purpose Ensure every chart Claude Code produces follows high-quality design standards with named themes, consistent styling, and clear data communication. ## When to Use Apply this skill whenever generating a chart, graph, or data visualization. Always apply the active theme unless the user specifies otherwise. Default theme: `minimal`. ## Instructions ### Pre-flight: Load Learnings Before executing, check `.knowledge/learnings/index.md` for relevant entrie
development
# Skill: Triangulation / Sanity Check ## Purpose Cross-reference analytical findings against multiple data sources, external benchmarks, and common sense to catch errors before they become bad decisions. ## When to Use Apply this skill after every analysis, before presenting findings to stakeholders, and whenever a result seems surprising. If a finding would change a decision, it MUST be triangulated first. ## Instructions ### Triangulation Framework Every finding gets checked through four
data-ai
# Skill: Tracking Gap Identification ## Purpose Assess whether the data needed for an analysis actually exists, identify what's missing, and produce prioritized instrumentation requests for engineering when gaps are found. ## When to Use Apply this skill after the Data Explorer agent inventories available data, when an analysis requires data that might not exist, or when initial query results suggest incomplete tracking. Run before committing to an analysis approach. ## Instructions ### Gap