skills/firecrawl/SKILL.md
Use when scraping web pages, searching the web via Firecrawl CLI, crawling sites for bulk content extraction, or automating browser interactions for content behind pagination or logins. Also use when the user mentions firecrawl, web scraping to markdown, or site mapping. NEVER use for API-based data fetching (use direct HTTP), non-web content extraction, or when the user has not installed Firecrawl.
npx skillsauth add sharkitect-solutions/sharkitect-claude-toolkit firecrawlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Web scraping, search, and browser automation CLI. Returns clean markdown optimized for LLM context windows.
| File | Purpose | Load When | | ---- | ------- | --------- | | rules/install.md | Installation and authentication setup | Firecrawl not installed or auth fails | | rules/security.md | Output handling and data guidelines | Storing or sharing scraped content |
Check status before use:
firecrawl --status
# Shows: version, auth status, concurrency limit, remaining credits
If not ready, see rules/install.md. Run firecrawl --help or firecrawl <command> --help for full option details.
Follow this escalation pattern -- start at the lowest step that fits:
map --search to find the right URL, then scrape it.| Need | Command | When |
| ---- | ------- | ---- |
| Find pages on a topic | search | No specific URL yet |
| Get a page's content | scrape | Have a URL, page is static or JS-rendered |
| Find URLs within a site | map | Need to locate a specific subpage |
| Bulk extract a site section | crawl | Need many pages (e.g., all /docs/) |
| AI-powered data extraction | agent | Need structured data from complex sites |
| Interact with a page | browser | Content requires clicks, form fills, pagination, or login |
| Save entire site to files | download | Combines map + scrape for bulk local saves |
Example: fetching API docs from a large site
search "site:docs.example.com authentication API" -> found the docs domain
map https://docs.example.com --search "auth" -> found /docs/api/authentication
scrape https://docs.example.com/docs/api/auth... -> got the content
Example: data behind pagination
scrape https://example.com/products -> only shows first 10 items, no next-page links
browser "open https://example.com/products" -> open in browser
browser "snapshot" -> find the pagination button
browser "click @e12" -> click "Next Page"
browser "scrape" -o .firecrawl/products-p2.md -> extract page 2 content
firecrawl search "your query" -o .firecrawl/result.json --json
firecrawl search "your query" --scrape -o .firecrawl/scraped.json --json
firecrawl search "your query" --sources news --tbs qdr:d -o .firecrawl/news.json --json
Options: --limit <n>, --sources <web,images,news>, --categories <github,research,pdf>, --tbs <qdr:h|d|w|m|y>, --location, --country <code>, --scrape, --scrape-formats, -o
Multiple URLs are scraped concurrently; each result is saved to .firecrawl/.
firecrawl scrape "<url>" -o .firecrawl/page.md
firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md
firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md
firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json
firecrawl scrape https://site.com/a https://site.com/b https://site.com/c
Options: -f <markdown,html,rawHtml,links,screenshot,json>, -H, --only-main-content, --wait-for <ms>, --include-tags, --exclude-tags, -o
firecrawl map "<url>" --search "authentication" -o .firecrawl/filtered.txt
firecrawl map "<url>" --limit 500 --json -o .firecrawl/urls.json
Options: --limit <n>, --search <query>, --sitemap <include|skip|only>, --include-subdomains, --json, -o
firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json
firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json
firecrawl crawl <job-id> # check status of a running crawl
Options: --wait, --progress, --limit <n>, --max-depth <n>, --include-paths, --exclude-paths, --delay <ms>, --max-concurrency <n>, --pretty, -o
AI-powered autonomous extraction (2-5 minutes).
firecrawl agent "extract all pricing tiers" --wait -o .firecrawl/pricing.json
firecrawl agent "extract products" --schema '{"type":"object","properties":{"name":{"type":"string"}}}' --wait -o .firecrawl/products.json
firecrawl agent "get feature list" --urls "<url>" --wait -o .firecrawl/features.json
Options: --urls, --model <spark-1-mini|spark-1-pro>, --schema <json>, --schema-file, --max-credits <n>, --wait, --pretty, -o
Cloud Chromium sessions in Firecrawl's remote sandboxed environment. Shorthand auto-launches a session.
firecrawl browser "open <url>"
firecrawl browser "snapshot" # accessibility tree with @ref IDs
firecrawl browser "click @e5"
firecrawl browser "fill @e3 'search query'"
firecrawl browser "scrape" -o .firecrawl/page.md
firecrawl browser close
| Command | Description |
| ------- | ----------- |
| open <url> | Navigate to a URL |
| snapshot | Get accessibility tree with @ref IDs |
| screenshot | Capture a PNG screenshot |
| click <@ref> | Click an element by ref |
| type <@ref> <text> | Type into an element |
| fill <@ref> <text> | Fill a form field (clears first) |
| scrape | Extract page content as markdown |
| scroll <direction> | Scroll up/down/left/right |
| wait <seconds> | Wait for a duration |
| eval <js> | Evaluate JavaScript on the page |
Session management: launch-session --ttl 600, list, close
Options: --ttl <seconds>, --ttl-inactivity <seconds>, --session <id>, -o
Combines map + scrape to save a site as local files. Always pass -y to skip confirmation.
firecrawl download https://docs.example.com -y
firecrawl download https://docs.example.com --include-paths "/features,/sdks" --only-main-content --screenshot -y
firecrawl download https://docs.example.com --exclude-paths "/zh,/ja,/fr" --limit 50 -y
Options: --limit <n>, --search <query>, --include-paths, --exclude-paths, --allow-subdomains, -y, plus all scrape options.
firecrawl credit-usage
firecrawl credit-usage --json --pretty -o .firecrawl/credits.json
Write results to .firecrawl/ with -o. Add .firecrawl/ to .gitignore. Always quote URLs -- shell interprets ? and & as special characters.
Naming conventions:
.firecrawl/search-{query}.json
.firecrawl/search-{query}-scraped.json
.firecrawl/{site}-{path}.md
Single format outputs raw content. Multiple formats (e.g., --format markdown,links) output JSON.
Never read entire output files at once:
wc -l .firecrawl/file.md && head -50 .firecrawl/file.md
grep -n "keyword" .firecrawl/file.md
jq -r '.data.web[].url' .firecrawl/search.json
Check .firecrawl/ for existing data before fetching again. search --scrape already includes full page content -- do not re-scrape those URLs.
Parallelization: Check concurrency limit via firecrawl --status. Run independent scrapes in parallel:
firecrawl scrape "<url-1>" -o .firecrawl/1.md &
firecrawl scrape "<url-2>" -o .firecrawl/2.md &
wait
For browser, launch separate sessions per independent task via --session <id>.
scrape first. It handles static pages and JS-rendered SPAs.browser only when scrape fails because content is behind interaction: pagination buttons, modals, dropdowns, multi-step navigation, or infinite scroll.browser for web searches -- use search instead.agent for structured data extraction from complex multi-page sites where format matters.| Scenario | Decision | Reason | | -------- | -------- | ------ | | Need content from a known URL | scrape | Fastest, cheapest, handles static + JS-rendered pages | | URL unknown, topic known | search | Finds and ranks relevant pages before committing credits | | Need one specific subpage from large site | map --search then scrape | Map filters to relevant URLs; avoids crawling entire site | | Need all pages in /docs section | crawl with --include-paths | Bulk extraction in a single job; more efficient than repeated scrapes | | Page has "Load More" or paginated JS | browser | Only escalate here when scrape returns incomplete content | | Need structured data across multiple pages | agent | AI handles navigation and schema extraction autonomously |
browser without first trying scrape -- try scrape first, it handles most JS pages.search --scrape -- check the output file first.head, grep, or jq to extract what you need.--include-paths or map --search first.? or & -- shell will misinterpret query parameters.agent for simple single-page extraction -- overkill; use scrape instead.firecrawl --status..firecrawl/ -- breaks naming conventions and gitignore coverage.browser as a substitute for search -- browser has no search capability.firecrawl --status when credits or concurrency limits may be a concern.head, grep, or jq.-y on download commands in automated/scripted contexts -- will hang waiting for input..env and .firecrawl/ -- see rules/security.md.development
When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.
testing
--- name: using-sharkitect-methodology description: Use when starting any conversation in a Sharkitect workspace OR before any task involving NEW pricing, positioning, proposal, strategy, plan-execution, or schema-design work — mandates invocation of Sharkitect-specific methodology skills (pricing-strategy, marketing-strategy-pmm, smb-cfo, hq-revenue-ops, executing-plans, brainstorming) under the same anti-rationalization discipline as using-superpowers. Documentation has failed 4 times across H
testing
Use when user says 'end session', 'wrap up', 'stop for the day', 'done for today', 'close out', 'save session', 'wrapping up', or invokes /end-session. Runs the full 9-step end-of-session protocol: resource audit, MEMORY.md update, lessons capture, plan status, pending items, workspace checklist, .tmp/ audit, git commit+push, Supabase brain sync, session brief, summary. Final step schedules a detached self-kill of the current session ONLY (3s delay) so the window closes cleanly. Other claude.exe processes (active workspaces) are NOT touched -- orphan cleanup is handled separately by Claude-Orphan-Cleanup-Hourly with proper age safeguards. Do NOT use for: mid-session quick saves (use session-checkpoint), skill syncing (use sync-skills.py), brain memory queries (use supabase-sync.py pull), document freshness reviews (use document-lifecycle), resource gap detection (use resource-auditor).
testing
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, passive voice, negative parallelisms, and filler phrases.