skills/web-search-researcher/SKILL.md
Conducts comprehensive web research to find accurate, relevant information. Use when you need modern information only discoverable on the web, documentation, best practices, or technical solutions. Uses curl+markdown.new, Exa/Parallel APIs, and camoufox browser — no surf/WebFetch/WebSearch.
npx skillsauth add pratos/clanker-setup web-search-researcherInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When this skill is triggered, ALWAYS display this banner first:
╭─────────────────────────────────────────────────────────────╮
│ 🌐 SKILL ACTIVATED: web-search-researcher │
├─────────────────────────────────────────────────────────────┤
│ Topic: [research question/topic] │
│ Action: Searching web for authoritative sources... │
│ Output: Synthesized findings with source links │
╰─────────────────────────────────────────────────────────────╯
This skill activates when:
NEVER use these tools for web research (they return undefined frequently):
surf — unreliable, pages often fail to renderWebFetch — frequently returns undefined/empty contentWebSearch — wrapper around surf, inherits its problemsALWAYS use these methods instead:
bash with curl + markdown.new — reliable page fetchingbash with Exa API — semantic web searchbash with Parallel API — agentic web searchbash with Google HTML scraping — free keyword searchCAMOFOX_URL is set) — JS-heavy/anti-bot pagesWhen you receive a research query:
Analyze the Query: Break down the request to identify:
Execute Strategic Searches:
Fetch and Analyze Content:
Synthesize Findings:
Use curl against markdown.new for clean, readable text from any URL.
curl -sL "https://markdown.new/https://docs.python.org/3/library/asyncio.html" | head -500
curl -sL --max-time 15 "https://markdown.new/https://example.com" | head -500
curl -sL "https://markdown.new/https://raw.githubusercontent.com/owner/repo/main/README.md" | head -500
curl -sL "https://api.github.com/repos/astral-sh/uv" | head -200
curl -sL "https://pypi.org/pypi/requests/json" | head -200
curl -sL "https://registry.npmjs.org/typescript" | head -200
for url in "https://example1.com" "https://example2.com"; do
echo "=== $url ==="
curl -sL --max-time 15 "https://markdown.new/$url" | head -300
echo ""
done
Scrape Google search results directly via curl. No API key needed.
# URL-encode the query and fetch results
QUERY="python asyncio best practices"
ENCODED=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$QUERY'))")
curl -sL --max-time 15 \
-H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36" \
"https://markdown.new/https://www.google.com/search?q=$ENCODED" | head -400
# Stack Overflow
curl -sL --max-time 15 "https://markdown.new/https://www.google.com/search?q=site:stackoverflow.com+python+asyncio" | head -300
# GitHub
curl -sL --max-time 15 "https://markdown.new/https://www.google.com/search?q=site:github.com+uv+python+package+manager" | head -300
# Official docs
curl -sL --max-time 15 "https://markdown.new/https://www.google.com/search?q=site:docs.python.org+asyncio+event+loop" | head -300
+ for spaces in the URL%22 for quotes (exact match): %22error+message+here%22site: for domain-specific: site:docs.python.org+asyncio- to exclude: python+web+framework+-djangoafter:YYYY-MM-DD for recencyExa provides semantic/neural search with optional content retrieval. Best for finding authoritative sources by intent rather than keywords.
EXA_API_KEY=$(cat ~/.config/sops-nix/secrets/exa/api-key 2>/dev/null)
EXA_API_KEY=$(cat ~/.config/sops-nix/secrets/exa/api-key 2>/dev/null)
curl -s "https://api.exa.ai/search" \
-H "x-api-key: $EXA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "best practices for building RAG pipelines in 2026",
"numResults": 5,
"type": "auto",
"contents": {
"highlights": true
}
}' | jq '.results[] | {title, url, highlights}'
EXA_API_KEY=$(cat ~/.config/sops-nix/secrets/exa/api-key 2>/dev/null)
curl -s "https://api.exa.ai/search" \
-H "x-api-key: $EXA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "your search query here",
"numResults": 5,
"type": "auto",
"contents": {
"text": {"maxCharacters": 1500}
}
}' | jq '.results[] | {title, url, text}'
EXA_API_KEY=$(cat ~/.config/sops-nix/secrets/exa/api-key 2>/dev/null)
curl -s "https://api.exa.ai/search" \
-H "x-api-key: $EXA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "kubernetes security best practices",
"numResults": 5,
"type": "auto",
"includeDomains": ["kubernetes.io", "github.com"],
"maxAgeHours": 8760,
"contents": {
"text": {"maxCharacters": 1000}
}
}' | jq '.results[] | {title, url, publishedDate, text}'
EXA_API_KEY=$(cat ~/.config/sops-nix/secrets/exa/api-key 2>/dev/null)
curl -s "https://api.exa.ai/search" \
-H "x-api-key: $EXA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "José Valim Elixir creator",
"numResults": 5,
"type": "auto",
"category": "people",
"contents": {
"highlights": true
}
}' | jq '.results[] | {title, url, highlights}'
EXA_API_KEY=$(cat ~/.config/sops-nix/secrets/exa/api-key 2>/dev/null)
curl -s "https://api.exa.ai/contents" \
-H "x-api-key: $EXA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"ids": ["https://example.com/article"],
"text": true
}' | jq '.results[] | {url, title, text}'
⚠️ Budget: ~$1/day max. Prefer curl+markdown.new and Google first. Use Exa when semantic search adds clear value (finding authoritative sources, people/company lookups, recency filtering).
Parallel provides objective-driven search with structured excerpts and source policy controls. Best for complex research queries.
PARALLEL_API_KEY=$(cat ~/.config/sops-nix/secrets/parallel/api-key 2>/dev/null)
PARALLEL_API_KEY=$(cat ~/.config/sops-nix/secrets/parallel/api-key 2>/dev/null)
curl -s "https://api.parallel.ai/v1/beta/search" \
-H "Authorization: Bearer $PARALLEL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"search_queries": ["your search query"],
"objective": "Find the most relevant and authoritative information about X",
"mode": "agentic",
"max_results": 5
}' | jq '.results[] | {title, url, excerpts}'
PARALLEL_API_KEY=$(cat ~/.config/sops-nix/secrets/parallel/api-key 2>/dev/null)
curl -s "https://api.parallel.ai/v1/beta/extract" \
-H "Authorization: Bearer $PARALLEL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com/article"],
"full_content": true
}' | jq '.results[] | {url, title, full_content}'
"fast" — Quick keyword-style results"one-shot" — Single-pass smart search"agentic" — Multi-step objective-driven (best quality, slower)When CAMOFOX_URL is set (e.g., Hermes runs a camoufox-browser server), use it for pages that block curl or require JavaScript rendering. This is a stealth Firefox fork with C++ fingerprint spoofing.
CAMOFOX_URL="${CAMOFOX_URL:-http://localhost:9377}"
curl -s "$CAMOFOX_URL/health" | jq .
CAMOFOX_URL="${CAMOFOX_URL:-http://localhost:9377}"
USER_ID="pi_research_$(date +%s)"
# Create a tab and navigate
TAB_ID=$(curl -s "$CAMOFOX_URL/tabs" \
-H "Content-Type: application/json" \
-d "{\"userId\": \"$USER_ID\", \"sessionKey\": \"research\", \"url\": \"https://example.com\"}" \
| jq -r '.tabId')
# Wait for page load
sleep 3
# Get accessibility snapshot (text content)
curl -s "$CAMOFOX_URL/tabs/$TAB_ID/snapshot?userId=$USER_ID" | jq -r '.snapshot' | head -300
# Clean up
curl -s -X DELETE "$CAMOFOX_URL/sessions/$USER_ID"
START
│
├─ Know the exact URL?
│ └─ YES → curl + markdown.new (Method 1)
│
├─ Need keyword/domain-specific results?
│ └─ YES → Google via curl (Method 2)
│
├─ Need semantic/intent-based search?
│ └─ YES → Exa API (Method 3)
│
├─ Need deep/agentic research?
│ └─ YES → Parallel API (Method 4)
│
├─ Page blocks curl / needs JS?
│ └─ YES → Camoufox (Method 5) if available,
│ otherwise try Exa/Parallel extract
│
└─ Default → Start with Google (Method 2),
fetch promising URLs with curl (Method 1)
| Scenario | Method | Cost |
|----------|--------|------|
| Know the exact URL | curl + markdown.new | Free |
| GitHub/PyPI/npm API info | curl (direct JSON API) | Free |
| Keyword search with links | Google via curl | Free |
| Site-specific search | Google with site: via curl | Free |
| Semantic/intent-based search | Exa API | ~$0.005-0.008/query |
| People/company lookup | Exa with category filter | ~$0.005-0.008/query |
| Time-filtered search | Exa with maxAgeHours | ~$0.005-0.008/query |
| Deep/agentic research | Parallel API | per-query pricing |
| JS-heavy / anti-bot pages | Camoufox browser | Free (self-hosted) |
| Topic | URL Pattern |
|-------|-------------|
| Python docs | https://docs.python.org/3/library/{module}.html |
| PyPI | https://pypi.org/pypi/{package}/json |
| npm | https://registry.npmjs.org/{package} |
| GitHub API | https://api.github.com/repos/{owner}/{repo} |
| MDN Web Docs | https://developer.mozilla.org/en-US/docs/Web/{topic} |
| Can I Use | https://caniuse.com/?search={feature} |
| Rust docs | https://docs.rs/{crate}/latest/ |
| Go docs | https://pkg.go.dev/{module} |
For a typical research task, combine methods:
# Step 1: Quick Google search for landscape
QUERY="nix+flakes+best+practices+2026"
curl -sL --max-time 15 \
-H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
"https://markdown.new/https://www.google.com/search?q=$QUERY" | head -300
# Step 2: Exa for authoritative sources with recency
EXA_API_KEY=$(cat ~/.config/sops-nix/secrets/exa/api-key 2>/dev/null)
curl -s "https://api.exa.ai/search" \
-H "x-api-key: $EXA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "nix flakes best practices and patterns",
"numResults": 5,
"type": "auto",
"maxAgeHours": 8760,
"contents": { "highlights": true }
}' | jq '.results[] | {title, url, highlights}'
# Step 3: Fetch top results via markdown.new
curl -sL --max-time 15 "https://markdown.new/https://nix.dev/guides/best-practices" | head -500
curl -sL --max-time 15 "https://markdown.new/https://zero-to-nix.com/concepts/flakes" | head -500
Structure your findings as:
## Summary
[Brief overview of key findings]
## Detailed Findings
### [Topic/Source 1]
**Source**: [Name with link]
**Relevance**: [Why this source is authoritative/useful]
**Key Information**:
- Direct quote or finding (with link to specific section if possible)
- Another relevant point
### [Topic/Source 2]
[Continue pattern...]
## Additional Resources
- [Relevant link 1] - Brief description
- [Relevant link 2] - Brief description
## Gaps or Limitations
[Note any information that couldn't be found or requires further investigation]
maxAgeHours for recencydevelopment
Enforces using uv to run all Python scripts and ty for type checking. Includes inline script metadata (PEP 723) for one-time scripts with dependencies.
development
Ensures .env files in TypeScript projects override sops-nix shell secrets. Use when setting up env loading, debugging missing/wrong API keys, or configuring dotenv in TS projects.
tools
Share agent session traces via the traces CLI. Use when the user asks to share/publish/upload a trace. Always use private visibility.
tools
Browser automation using Stagehand and AI. Use when you need to interact with websites, fill forms, login to services, extract data from web pages, or automate web workflows. Provides the browser_automate tool.