skills/orthogonal-extract-webpage-data/SKILL.md
Extract structured data from web pages using AI
npx skillsauth add orthogonal-sh/skills extract-webpage-dataInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extract structured data from any web page using AI. Turn messy HTML into clean, organized data.
Uses Olostep, Scrapegraph, or Riveter APIs for AI-powered data extraction.
orth run olostep /v1/scrapes -d '{"url_to_scrape":"https://example.com/products"}'
orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/team","user_prompt":"Extract all team members with their names, titles, and LinkedIn URLs"}'
orth run riveter /v1/scrape -d '{"url":"https://example.com","schema":{"name":"string","price":"number","description":"string"}}'
orth run olostep /v1/answers -d '{"task":"Find the pricing for Notion Teams plan from their website"}'
orth run olostep /v1/crawls -d '{"start_url":"https://example.com","max_pages":10}'
Returns a scrape object:
scrape_z926lxxon3)formats)status_code of the pageAsync crawls: POST /v1/crawls returns an id. Poll with GET /v1/crawls/{id} until complete.
Returns structured extraction result:
completed or pendingNote: For large pages, the POST may return status: "pending". Poll with GET /v1/smartscraper/{request_id} until status is completed.
Returns scrape result:
success or errorUser: "Get all the product names and prices from this page"
orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/products","user_prompt":"Extract all products with name, price, and description"}'
User: "Scrape the team page and get everyone's info"
orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/about/team","user_prompt":"Extract team members: name, role, bio, photo URL, LinkedIn"}'
User: "What are Stripe's API pricing details?"
orth run olostep /v1/answers -d '{"task":"Find Stripe API pricing breakdown from stripe.com/pricing"}'
User: "Get all blog post titles and dates from this blog"
orth run riveter /v1/scrape -d '{"url":"https://blog.example.com","schema":{"posts":[{"title":"string","date":"string","url":"string"}]}}'
url_to_scrape for Olostep, website_url + user_prompt for Scrapegraph, url for Riveter)error field in response body — check it even on 200 statusrequest_status: "error" with details in messagetesting
Download videos from YouTube, Bilibili, Twitter, and thousands of other sites using yt-dlp. Use when the user provides a video URL and wants to download it, extract audio (MP3), download subtitles, or select video quality. Triggers on phrases like "下载视频", "download video", "yt-dlp", "YouTube", "B站", "抖音", "提取音频", "extract audio".
business
Send messages and manage Slack channels. Use when asked to send Slack messages, post to channels, list channels, or fetch message history.
development
Evaluate YC batch companies for investment — scrapes the YC directory, researches each company and its founders (work history, LinkedIn, website), assesses founder-company fit, and exports to Google Sheets with priority rankings. Use when asked to evaluate YC companies, research a YC batch, screen startups, or do due diligence on YC companies.
development
Take screenshots of websites and web pages