exa-notebooklm/SKILL.md
Source-grounded research pipeline: Exa web search → NotebookLM RAG → structured data. Finds real sources (no Wikipedia), feeds them into a private knowledge base, queries for cited facts. Zero hallucination content generation.
npx skillsauth add snqb/my-skills exa-notebooklmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Combine Exa (semantic web search) with NotebookLM (source-grounded RAG) to produce deeply researched, fully cited content — without LLM hallucination.
Requires: exa-search skill + notebooklm skill installed and authenticated.
1. EXA SEARCH → find real sources (academic, journalism, primary)
2. FEED INTO NB → create notebook, add best URLs as sources
3. QUERY NB → ask synthesis questions, get cited answers
4. CURATE OUTPUT → extract facts, quotes, numbers into structured data
Two approaches depending on scope:
Best for breadth. Exclude Wikipedia — the point is sources Wikipedia doesn't have.
export EXA_API_KEY=$(pass api/exa 2>/dev/null)
QUERIES=(
"topic primary sources"
"topic academic paper"
"topic documentary evidence"
"topic statistics data 2024"
)
for q in "${QUERIES[@]}"; do
curl -s -X POST "https://api.exa.ai/search" \
-H "x-api-key: $EXA_API_KEY" \
-H "content-type: application/json" \
-d "{
\"query\": \"$q\",
\"numResults\": 8,
\"excludeDomains\": [\"wikipedia.org\"],
\"contents\": { \"text\": { \"maxCharacters\": 2000 } }
}" | jq -c '.results[]' >> /tmp/exa-batch.jsonl
sleep 0.5
done
jq -s 'unique_by(.url)' /tmp/exa-batch.jsonl > /tmp/exa-deduped.json
echo "Unique: $(jq length /tmp/exa-deduped.json)"
Best for depth on a specific question. Exa runs multi-query expansion internally.
curl -s -X POST "https://api.exa.ai/search" \
-H "x-api-key: $EXA_API_KEY" \
-H "content-type: application/json" \
-d '{
"query": "your research question here",
"type": "deep",
"effort": "max",
"numResults": 20,
"excludeDomains": ["wikipedia.org"],
"contents": { "text": { "maxCharacters": 3000 } }
}' | jq '.results[] | {title, url}' > /tmp/exa-deep.json
For the heaviest research. Exa plans, searches, reads, and produces a structured report.
# Create async research task
curl -s -X POST "https://api.exa.ai/research" \
-H "x-api-key: $EXA_API_KEY" \
-H "content-type: application/json" \
-d '{
"instructions": "Find all primary sources about TOPIC. Focus on academic papers, government data, and journalist accounts.",
"model": "exa-research"
}' | jq '{id: .researchId, status: .status}'
# Poll for results
curl -s "https://api.exa.ai/research/RESEARCH_ID" \
-H "x-api-key: $EXA_API_KEY" | jq '{status, output}'
| Domain type | Examples | Why | |---|---|---| | Academic | academia.edu, jstor.org, researchgate.net | Primary research | | UNESCO/IGO | unesco.org, worldbank.org | Authoritative data | | Long-form journalism | theguardian.com, rferl.org | Detailed reporting | | Specialist sites | craft blogs, industry journals | Niche expertise | | Government | .gov, national statistics | Official data |
Create a notebook and add the best URLs as sources.
cd ~/.pi/agent/skills/notebooklm
# Create notebook
bash run.sh create "Research: Your Topic"
bash run.sh use <notebook_id>
# Add URLs from Exa results (pick the best ones)
bash run.sh source add "https://example.com/great-article"
bash run.sh source add "https://example.com/another-source"
# Add raw text for sources that don't have clean URLs
bash run.sh source add-text "Field Notes" "Your observations or quotes..."
# Check what's loaded
bash run.sh source list
source list — sources need "ready" status before querying.add-research for auto-discovery: NB can find additional sources on its own:
bash run.sh source add-research "your topic" --mode deep --import-all
Ask synthesis questions. NB answers ONLY from ingested sources, with citations.
cd ~/.pi/agent/skills/notebooklm
# Simple query
bash run.sh ask "What are the key findings about X?"
# JSON output with citation mapping (for programmatic use)
bash run.sh ask "List all dates and events mentioned. Cite [N]." --json --new
# Fresh conversation per question (prevents context bleed)
bash run.sh ask "Who are the main people involved?" --json --new
bash run.sh ask "What statistics are cited?" --json --new
Lists > summaries. Ask for specific extractable data:
✅ "List all dates, names, and numbers mentioned about X. Cite [N]."
✅ "What direct quotes appear in the sources about Y?"
✅ "Create a bibliography of all sources that discuss Z."
✅ "Compare source A vs source B on topic W. 2-3 sentences. Cite [N]."
❌ "Summarize everything" (too vague, wastes the grounding)
❌ "What do you think about X?" (NB isn't meant for opinions)
Keep answers short when using --json — long answers timeout:
✅ "When was X founded? 1-2 sentences. Cite [N]."
❌ "Tell me everything about X's history in detail."
Use --new for each question to prevent context bleed between queries.
NB_A="<id_a>"
NB_B="<id_b>"
(bash run.sh ask "question" -n $NB_A --new > /tmp/a.txt &
bash run.sh ask "question" -n $NB_B --new > /tmp/b.txt &
wait)
Take NB answers and structure them. The citation chain:
NB answer → citation [N] → source_id → URL (from source list)
# Get source mapping
bash run.sh source list --json > /tmp/sources.json
# Now you can trace: answer text → [1] → source_id → original URL
Write your final output with inline citations pointing to real URLs. This is what makes the content trustworthy — every claim traces back to a specific source.
# === 1. EXA: Find sources ===
export EXA_API_KEY=$(pass api/exa 2>/dev/null)
for q in \
"Central Asian textile traditions academic" \
"felt carpet making techniques UNESCO" \
"nomadic craft documentation"; do
curl -s -X POST "https://api.exa.ai/search" \
-H "accept: application/json" \
-H "content-type: application/json" \
-H "x-api-key: $EXA_API_KEY" \
-d "{
\"query\": \"$q\",
\"useAutoprompt\": true,
\"numResults\": 8,
\"excludeDomains\": [\"wikipedia.org\"],
\"contents\": { \"text\": { \"maxCharacters\": 2000 } }
}" | jq -c '.results[]' >> /tmp/research.jsonl
sleep 0.5
done
echo "Found $(wc -l < /tmp/research.jsonl) results"
# === 2. FEED: Best URLs into NotebookLM ===
cd ~/.pi/agent/skills/notebooklm
bash run.sh create "Textile Traditions Research"
bash run.sh use <id>
# Add top URLs (review /tmp/research.jsonl, pick best)
jq -r '.url' /tmp/research.jsonl | sort -u | head -40 | while read url; do
bash run.sh source add "$url"
sleep 1
done
# === 3. QUERY: Extract structured data ===
bash run.sh ask "List all textile techniques mentioned with regions. Cite [N]." --json --new > /tmp/q1.json
bash run.sh ask "What specific materials and tools are described? Cite [N]." --json --new > /tmp/q2.json
bash run.sh ask "Timeline: when were these crafts first documented? Cite [N]." --json --new > /tmp/q3.json
# === 4. Also generate a podcast for fun ===
bash run.sh generate audio "Focus on the craft techniques" --format deep-dive --wait
bash run.sh download audio ./textiles-podcast.mp3
| Problem | Fix |
|---|---|
| Exa returns 0 results | Check API key, check credits at exa.ai dashboard |
| NB source stuck on "processing" | Wait 1-2 min, or source refresh <id> |
| NB query times out | Use NOTEBOOKLM_TIMEOUT=120, keep answer short |
| NB auth expired | bash run.sh login |
| Too many sources | Split across multiple notebooks |
documentation
Enrich Markdown articles with inline Wikipedia links. First mention of each notable entity gets a hyperlink. Use when asked to add wiki links, enrich, or add references to .md files.
development
Structured visual QA: screenshot → batch issues → fix all → verify. Replaces the 300-cycle screenshot→edit death spiral. Optional bishkek review as exit gate. Use when building/polishing UI with browser testing, or when user asks for N iterations/reviews.
development
Find complex code, analyze intent, recommend battle-tested library replacements. Uses radon/eslint for detection, GitHub quality search for alternatives.
research
Research real-world UI patterns from curated galleries (Collect UI, Component Gallery, Mobbin). Use when exploring what exists: dropdowns, accordions, inputs, navigation, cards, modals, etc.