Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ai-analyst-lab/.claude/skills/notion-ingest

Name: .claude/skills/notion-ingest
Author: ai-analyst-lab

.claude/skills/notion-ingest/SKILL.md

npx skillsauth add ai-analyst-lab/ai-analyst .claude/skills/notion-ingest

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

/notion-ingest — Notion Workspace Crawler

Crawls a Notion workspace to extract business terms, metrics, product docs, and team structure. Populates the organization knowledge system.

Trigger

Invoked as /notion-ingest or /notion-ingest {workspace_url}

Prerequisites

Notion integration token configured in .knowledge/user/integrations.yaml
Organization directory exists at .knowledge/organizations/{org}/
If no token: "Notion integration token not found. Add it to .knowledge/user/integrations.yaml under notion.token. See Notion Integration Guide for setup."

Overview

This skill uses a breadth-first crawl strategy to systematically traverse a Notion workspace, converting pages to structured knowledge entries. It does NOT require external Python packages — all Notion API calls use inline HTTP requests.

Step 1: Authentication Check

import yaml, os

# Load integration config
integrations_path = ".knowledge/user/integrations.yaml"
with open(integrations_path) as f:
    config = yaml.safe_load(f)

notion_token = config.get("notion", {}).get("token")
if not notion_token:
    print("❌ No Notion token found. Add to .knowledge/user/integrations.yaml")
    # HALT

Verify token works with a simple API call:

GET https://api.notion.com/v1/users/me
Authorization: Bearer {token}
Notion-Version: 2022-06-28

Step 2: Workspace Discovery

Ask the user for crawl scope:

Notion workspace connected. How would you like to crawl?

1. **Full workspace** — Crawl all accessible pages (may be slow for large workspaces)
2. **Specific database** — Provide a database URL to crawl
3. **Specific page tree** — Provide a root page URL to crawl its children
4. **Search by keyword** — Search for pages matching specific terms

Step 3: BFS Crawl Strategy

Algorithm: Breadth-First Search (BFS)

Queue ← [root_page_id]
Visited ← {}
Results ← []

WHILE Queue is not empty:
    page_id ← Queue.dequeue()
    IF page_id IN Visited: CONTINUE
    Visited.add(page_id)

    page ← fetch_page(page_id)        # GET /v1/pages/{id}
    children ← fetch_children(page_id) # GET /v1/blocks/{id}/children

    result ← convert_to_knowledge(page, children)
    Results.append(result)

    # Enqueue child pages and linked databases
    FOR child IN children:
        IF child.type == "child_page" OR child.type == "child_database":
            Queue.enqueue(child.id)

    rate_limit_pause()  # See Step 4

Page Fetch

GET https://api.notion.com/v1/pages/{page_id}
Authorization: Bearer {token}
Notion-Version: 2022-06-28

Block Children Fetch (paginated)

GET https://api.notion.com/v1/blocks/{block_id}/children?page_size=100
Authorization: Bearer {token}
Notion-Version: 2022-06-28

Handle pagination via has_more and next_cursor.

Step 4: Rate Limiting

Notion API limits: 3 requests per second for integration tokens.

import time

class RateLimiter:
    """Simple token-bucket rate limiter for Notion API."""

    def __init__(self, requests_per_second=2.5):
        self.min_interval = 1.0 / requests_per_second
        self.last_request = 0

    def wait(self):
        elapsed = time.time() - self.last_request
        if elapsed < self.min_interval:
            time.sleep(self.min_interval - elapsed)
        self.last_request = time.time()

Backoff strategy:

On 429 (rate limited): wait Retry-After header seconds, minimum 1s
On 5xx: exponential backoff (1s, 2s, 4s), max 3 retries
On 4xx (not 429): log error, skip page, continue crawl

Step 5: Page-to-Markdown Conversion

Convert Notion block types to markdown:

| Notion Block Type | Markdown Output | |-------------------|-----------------| | paragraph | Plain text | | heading_1 | # Title | | heading_2 | ## Title | | heading_3 | ### Title | | bulleted_list_item | - Item | | numbered_list_item | 1. Item | | code | ```lang\ncode\n``` | | quote | > Quote | | callout | > ℹ️ Callout | | table | Markdown table | | divider | --- | | toggle | Treat as heading + nested content | | child_page | [Page Title](notion://page_id) | | child_database | [Database Title](notion://db_id) |

Rich text extraction:

Bold → **text**
Italic → *text*
Code → `text`
Links → [text](url)
Mentions → @{mention_name}

Step 6: Knowledge Extraction

For each crawled page, attempt to classify and extract structured knowledge:

Auto-Classification Rules

| Page Contains | Classification | Target File | |---------------|---------------|-------------| | Term definitions, glossary entries | Glossary term | business/glossary/terms.yaml | | KPI, metric, formula | Metric definition | business/metrics/index.yaml | | Product name, feature list | Product entry | business/products/index.yaml | | OKR, objective, key result | Objective | business/objectives/index.yaml | | Team name, org chart | Team entry | business/teams/index.yaml | | SQL query, data pattern | Query archaeology | .knowledge/query-archaeology/raw/ |

Classification Heuristics

Glossary: Page title contains "glossary", "definitions", "terms", OR content has definition-like patterns ("X is defined as", "X means")
Metrics: Content contains "KPI", "metric", "formula", "calculated as", OR has numeric targets/thresholds
Products: Content contains "product", "feature", "roadmap", OR is in a database with product-like properties
Objectives: Content contains "OKR", "objective", "key result", "goal", "target", OR has quarterly references
Teams: Content contains "team", "squad", "org chart", OR has role/person properties

Raw Storage

All crawled pages are saved as raw markdown to:

.knowledge/query-archaeology/raw/notion_{page_id_short}.md

With YAML frontmatter:

---
source: notion
page_id: {full_page_id}
title: {page_title}
url: {page_url}
crawled_at: {timestamp}
classification: {auto_class or "unclassified"}
---

Step 7: Progress Reporting

During crawl, show progress:

🔄 Crawling Notion workspace...

  Pages crawled:    45/~120 (estimated)
  Terms extracted:  12
  Metrics found:    5
  Products found:   3
  Errors:           1 (skipped)

  Current: "Q4 2025 OKR Tracker"

Step 8: Post-Crawl Summary

After crawl completes:

✅ Notion ingest complete!

  Pages crawled:     127
  Pages skipped:     3 (errors logged)

  Knowledge extracted:
    Glossary terms:  23 → business/glossary/terms.yaml
    Metrics:         8  → business/metrics/index.yaml
    Products:        5  → business/products/index.yaml
    Objectives:      12 → business/objectives/index.yaml
    Teams:           4  → business/teams/index.yaml

  Raw pages saved:   127 → .knowledge/query-archaeology/raw/

  Review extracted knowledge with `/business` to verify accuracy.
  Auto-classifications may need manual correction.

Step 9: Capture to Query Archaeology

For pages containing SQL queries or data patterns, create cookbook entries:

from helpers.archaeology_helpers import capture_cookbook_entry

# For each page with SQL content
capture_cookbook_entry(
    title=page_title,
    sql=extracted_sql,
    description=f"From Notion: {page_title}",
    tags=["notion-import", classification],
    source=f"notion:{page_id}"
)

Error Handling

| Error | Response | |-------|----------| | Invalid token | "Notion token is invalid or expired. Update in .knowledge/user/integrations.yaml." | | Permission denied (403) | "Cannot access page '{title}'. Check integration permissions in Notion." | | Rate limited (429) | Auto-retry with backoff (transparent to user) | | Network error | Retry 3x, then skip page and continue | | Empty workspace | "No accessible pages found. Verify the integration has access to your workspace." | | Large workspace (500+) | "Large workspace detected (~{n} pages). This may take several minutes. Continue? [Y/n]" |

Incremental Updates

For subsequent runs, support incremental mode:

/notion-ingest --incremental

Read .knowledge/organizations/{org}/notion_sync_state.yaml for last sync timestamp
Use Notion search API with filter.timestamp.last_edited_time.after parameter
Only process pages modified since last sync
Update sync state after completion

Reset

/notion-ingest reset — Clears all raw Notion pages and sync state. Does NOT remove extracted knowledge entries (those must be cleaned via /business or manually).

ai-analyst-lab/.claude/skills/notion-ingest

.claude/skills/notion-ingest/SKILL.md

# /notion-ingest — Notion Workspace Crawler > Crawls a Notion workspace to extract business terms, metrics, product docs, > and team structure. Populates the organization knowledge system. ## Trigger Invoked as `/notion-ingest` or `/notion-ingest {workspace_url}` ## Prerequisites - Notion integration token configured in `.knowledge/user/integrations.yaml` - Organization directory exists at `.knowledge/organizations/{org}/` - If no token: "Notion integration token not found. Add it to `.knowle

146 stars

documentation

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add ai-analyst-lab/ai-analyst .claude/skills/notion-ingest

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 4, 2026, 3:47 AM4.0s1 file scanned

SKILL.md

/notion-ingest — Notion Workspace Crawler

Crawls a Notion workspace to extract business terms, metrics, product docs, and team structure. Populates the organization knowledge system.

Trigger

Invoked as /notion-ingest or /notion-ingest {workspace_url}

Prerequisites

Notion integration token configured in .knowledge/user/integrations.yaml
Organization directory exists at .knowledge/organizations/{org}/
If no token: "Notion integration token not found. Add it to .knowledge/user/integrations.yaml under notion.token. See Notion Integration Guide for setup."

Overview

Step 1: Authentication Check

import yaml, os

# Load integration config
integrations_path = ".knowledge/user/integrations.yaml"
with open(integrations_path) as f:
    config = yaml.safe_load(f)

notion_token = config.get("notion", {}).get("token")
if not notion_token:
    print("❌ No Notion token found. Add to .knowledge/user/integrations.yaml")
    # HALT

Verify token works with a simple API call:

GET https://api.notion.com/v1/users/me
Authorization: Bearer {token}
Notion-Version: 2022-06-28

Step 2: Workspace Discovery

Ask the user for crawl scope:

Notion workspace connected. How would you like to crawl?

1. **Full workspace** — Crawl all accessible pages (may be slow for large workspaces)
2. **Specific database** — Provide a database URL to crawl
3. **Specific page tree** — Provide a root page URL to crawl its children
4. **Search by keyword** — Search for pages matching specific terms

Step 3: BFS Crawl Strategy

Algorithm: Breadth-First Search (BFS)

Queue ← [root_page_id]
Visited ← {}
Results ← []

WHILE Queue is not empty:
    page_id ← Queue.dequeue()
    IF page_id IN Visited: CONTINUE
    Visited.add(page_id)

    page ← fetch_page(page_id)        # GET /v1/pages/{id}
    children ← fetch_children(page_id) # GET /v1/blocks/{id}/children

    result ← convert_to_knowledge(page, children)
    Results.append(result)

    # Enqueue child pages and linked databases
    FOR child IN children:
        IF child.type == "child_page" OR child.type == "child_database":
            Queue.enqueue(child.id)

    rate_limit_pause()  # See Step 4

Page Fetch

GET https://api.notion.com/v1/pages/{page_id}
Authorization: Bearer {token}
Notion-Version: 2022-06-28

Block Children Fetch (paginated)

GET https://api.notion.com/v1/blocks/{block_id}/children?page_size=100
Authorization: Bearer {token}
Notion-Version: 2022-06-28

Handle pagination via has_more and next_cursor.

Step 4: Rate Limiting

Notion API limits: 3 requests per second for integration tokens.

import time

class RateLimiter:
    """Simple token-bucket rate limiter for Notion API."""

    def __init__(self, requests_per_second=2.5):
        self.min_interval = 1.0 / requests_per_second
        self.last_request = 0

    def wait(self):
        elapsed = time.time() - self.last_request
        if elapsed < self.min_interval:
            time.sleep(self.min_interval - elapsed)
        self.last_request = time.time()

Backoff strategy:

On 429 (rate limited): wait Retry-After header seconds, minimum 1s
On 5xx: exponential backoff (1s, 2s, 4s), max 3 retries
On 4xx (not 429): log error, skip page, continue crawl

Step 5: Page-to-Markdown Conversion

Convert Notion block types to markdown:

Rich text extraction:

Bold → **text**
Italic → *text*
Code → `text`
Links → [text](url)
Mentions → @{mention_name}

Step 6: Knowledge Extraction

For each crawled page, attempt to classify and extract structured knowledge:

Auto-Classification Rules

Classification Heuristics

Glossary: Page title contains "glossary", "definitions", "terms", OR content has definition-like patterns ("X is defined as", "X means")
Metrics: Content contains "KPI", "metric", "formula", "calculated as", OR has numeric targets/thresholds
Products: Content contains "product", "feature", "roadmap", OR is in a database with product-like properties
Objectives: Content contains "OKR", "objective", "key result", "goal", "target", OR has quarterly references
Teams: Content contains "team", "squad", "org chart", OR has role/person properties

Raw Storage

All crawled pages are saved as raw markdown to:

.knowledge/query-archaeology/raw/notion_{page_id_short}.md

With YAML frontmatter:

---
source: notion
page_id: {full_page_id}
title: {page_title}
url: {page_url}
crawled_at: {timestamp}
classification: {auto_class or "unclassified"}
---

Step 7: Progress Reporting

During crawl, show progress:

🔄 Crawling Notion workspace...

  Pages crawled:    45/~120 (estimated)
  Terms extracted:  12
  Metrics found:    5
  Products found:   3
  Errors:           1 (skipped)

  Current: "Q4 2025 OKR Tracker"

Step 8: Post-Crawl Summary

After crawl completes:

✅ Notion ingest complete!

  Pages crawled:     127
  Pages skipped:     3 (errors logged)

  Knowledge extracted:
    Glossary terms:  23 → business/glossary/terms.yaml
    Metrics:         8  → business/metrics/index.yaml
    Products:        5  → business/products/index.yaml
    Objectives:      12 → business/objectives/index.yaml
    Teams:           4  → business/teams/index.yaml

  Raw pages saved:   127 → .knowledge/query-archaeology/raw/

  Review extracted knowledge with `/business` to verify accuracy.
  Auto-classifications may need manual correction.

Step 9: Capture to Query Archaeology

For pages containing SQL queries or data patterns, create cookbook entries:

from helpers.archaeology_helpers import capture_cookbook_entry

# For each page with SQL content
capture_cookbook_entry(
    title=page_title,
    sql=extracted_sql,
    description=f"From Notion: {page_title}",
    tags=["notion-import", classification],
    source=f"notion:{page_id}"
)

Error Handling

Incremental Updates

For subsequent runs, support incremental mode:

/notion-ingest --incremental

Read .knowledge/organizations/{org}/notion_sync_state.yaml for last sync timestamp
Use Notion search API with filter.timestamp.last_edited_time.after parameter
Only process pages modified since last sync
Update sync state after completion

Reset

/notion-ingest reset — Clears all raw Notion pages and sync state. Does NOT remove extracted knowledge entries (those must be cleaned via /business or manually).

Related Skills

ai-analyst-lab/.claude/skills/TEMPLATE

testing

VerifiedTrustedCommunity

# Skill: {{BLANK_1_SKILL_NAME}} ## Purpose {{BLANK_2_WHEN_TO_FIRE}} ## When to Use Fires automatically when the user asks Claude to do something that matches the trigger condition above. ## Instructions 1. Detect the trigger condition 2. Execute your guardrail check 3. If the check matters, print a clear, visible warning with "{{BLANK_3_SIGNATURE_PHRASE}}" as the first line 4. Continue with the analysis, incorporating the warning into the output ## Anti-Patterns - Do not fire when the condit

175SKILL.mdUpdated Apr 17, 2026

ai-analyst-lab/.claude/skills/TEMPLATE

ai-analyst-lab/.claude/skills/visualization-patterns

development

VerifiedTrustedCommunity

# Skill: Visualization Patterns ## Purpose Ensure every chart Claude Code produces follows high-quality design standards with named themes, consistent styling, and clear data communication. ## When to Use Apply this skill whenever generating a chart, graph, or data visualization. Always apply the active theme unless the user specifies otherwise. Default theme: `minimal`. ## Instructions ### Pre-flight: Load Learnings Before executing, check `.knowledge/learnings/index.md` for relevant entrie

146SKILL.mdUpdated Apr 4, 2026

ai-analyst-lab/.claude/skills/visualization-patterns

ai-analyst-lab/.claude/skills/triangulation

development

VerifiedTrustedCommunity

# Skill: Triangulation / Sanity Check ## Purpose Cross-reference analytical findings against multiple data sources, external benchmarks, and common sense to catch errors before they become bad decisions. ## When to Use Apply this skill after every analysis, before presenting findings to stakeholders, and whenever a result seems surprising. If a finding would change a decision, it MUST be triangulated first. ## Instructions ### Triangulation Framework Every finding gets checked through four

146SKILL.mdUpdated Apr 4, 2026

ai-analyst-lab/.claude/skills/triangulation

ai-analyst-lab/.claude/skills/tracking-gaps

data-ai

VerifiedTrustedCommunity

# Skill: Tracking Gap Identification ## Purpose Assess whether the data needed for an analysis actually exists, identify what's missing, and produce prioritized instrumentation requests for engineering when gaps are found. ## When to Use Apply this skill after the Data Explorer agent inventories available data, when an analysis requires data that might not exist, or when initial query results suggest incomplete tracking. Run before committing to an analysis approach. ## Instructions ### Gap

146SKILL.mdUpdated Apr 4, 2026

ai-analyst-lab/.claude/skills/tracking-gaps

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ai-analyst-lab/ai-analyst.git

# Copy into Claude Code skills folder (global)
cp -r ai-analyst/.claude/skills/notion-ingest ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ai-analyst-lab/ai-analyst

146 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT