skills/dumplingai-scraping-extraction/SKILL.md
When the user wants to scrape a page, crawl a site, extract structured content, pull readable text from a URL, or turn raw web pages into usable data through DumplingAI. Also use when the user mentions scraping, crawling, extract this page, parse this site, markdown from URL, website content extraction, Firecrawl, page content, or structured extraction from a webpage. Use this whenever the task starts with one or more known URLs and the goal is to fetch or extract their contents. For discovering which pages or sources to inspect first, see dumplingai-web-research.
npx skillsauth add dumplingai/cli dumplingai-scraping-extractionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
dumplingai catalog search <prompt>
dumplingai catalog details <type> <id>
dumplingai run <type> <id> --input '<json>'
scrape page, crawl website, or firecrawl scrape.dumplingai catalog search "<job>".dumplingai catalog details <type> <id>..dumplingai/ and inspect them with head or rg.scrape pagecrawl websiteextract page contentfirecrawl scrapewebsite extractiondumplingai catalog search "scrape page" > .dumplingai/catalog-search.json
dumplingai catalog details capability scrape_page > .dumplingai/scrape-page.json
dumplingai run capability scrape_page --input '{"url":"https://example.com"}' > .dumplingai/page.json
head -80 .dumplingai/page.json
rg '"markdown"|"content"|"text"|"links"' .dumplingai/page.json
data-ai
When the user wants to turn a YouTube video into a blog post, article, outline, newsletter draft, or transcript-based written summary. Also use when the user mentions repurposing a video, extracting a transcript, rewriting a YouTube talk into a post, or expanding a creator video into long-form content. Use this whenever the workflow should start with transcript extraction and optional source verification through DumplingAI capabilities.
testing
When the user wants social media content, post variants, threads, captions, hooks, or channel-specific copy for X, LinkedIn, Instagram, TikTok, or YouTube Shorts. Also use when the user wants to turn a topic, campaign idea, product page, article, or transcript into social posts, or asks for platform-specific repurposing instead of one generic draft. Use this whenever research, source extraction, and writing should be routed through DumplingAI-powered capabilities.
development
When the user wants web research, search results, cited sources, or fast topic discovery through DumplingAI instead of manual browsing or a direct search API integration. Also use when the user mentions Google search, SERP results, web search, search for sources, topic research, competitor research, answer sources, Serper, Perplexity, or wants to compare what the web says about something. Use this whenever the job is to search first, then inspect or synthesize results. For scraping a specific page or site after discovery, see dumplingai-scraping-extraction.
data-ai
When the user wants a transcript, captions, spoken-text extraction, or media-to-text workflow through DumplingAI. Also use when the user mentions YouTube transcript, TikTok transcript, transcribe this video, pull captions, extract text from audio, transcript from URL, or wants to inspect the contents of a video before writing or summarizing from it. Use this whenever the job is to turn video or audio into text first. For turning a YouTube transcript into a finished article, see youtube-to-blog-post.