Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

orthogonal-sh/extract-webpage-data

Name: extract-webpage-data
Author: orthogonal-sh

skills/orthogonal-extract-webpage-data/SKILL.md

npx skillsauth add orthogonal-sh/skills extract-webpage-data

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Extract Webpage Data

Extract structured data from any web page using AI. Turn messy HTML into clean, organized data.

When to Use

User wants to extract specific data from a website
User asks to scrape information from a page
User needs structured data from unstructured content
User wants to pull product info, contact details, etc.
Converting web content to usable data

How It Works

Uses Olostep, Scrapegraph, or Riveter APIs for AI-powered data extraction.

Usage

Simple Scrape with Olostep

orth run olostep /v1/scrapes -d '{"url_to_scrape":"https://example.com/products"}'

AI-Powered Extraction with Scrapegraph

orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/team","user_prompt":"Extract all team members with their names, titles, and LinkedIn URLs"}'

Schema-Based Extraction with Riveter

orth run riveter /v1/scrape -d '{"url":"https://example.com","schema":{"name":"string","price":"number","description":"string"}}'

Get AI Answer from Web

orth run olostep /v1/answers -d '{"task":"Find the pricing for Notion Teams plan from their website"}'

Crawl Multiple Pages

orth run olostep /v1/crawls -d '{"start_url":"https://example.com","max_pages":10}'

Parameters

Olostep Scrape

url_to_scrape (required) - URL to scrape
formats - Output formats (markdown, html, text)

Scrapegraph

website_url (required) - URL to scrape
user_prompt (required) - Natural language description of what to extract

Riveter

url (required) - URL to scrape
schema - JSON schema defining the data structure to extract

Olostep Answer

task (required) - Natural language task/question

Response

Olostep Response

Returns a scrape object:

id (string) - Scrape ID (e.g., scrape_z926lxxon3)
result.markdown_content (string|null) - Page content as markdown
result.html_content (string|null) - Raw HTML (if requested via formats)
result.text_content (string|null) - Plain text (if requested)
result.markdown_hosted_url (string|null) - S3 URL for large content
result.links_on_page (array) - Links found on the page
result.screenshot_hosted_url (string|null) - Screenshot URL (if requested)
result.page_metadata (object) - status_code of the page
credits_consumed (integer) - Credits used for this scrape

Async crawls: POST /v1/crawls returns an id. Poll with GET /v1/crawls/{id} until complete.

Scrapegraph Response

Returns structured extraction result:

request_id (string) - Unique request identifier
status (string) - completed or pending
result (object) - AI-extracted data matching your prompt (dynamic keys)
error (string) - Empty on success, error message on failure

Note: For large pages, the POST may return status: "pending". Poll with GET /v1/smartscraper/{request_id} until status is completed.

Riveter Response

Returns scrape result:

request_status (string) - success or error
message (string) - Human-readable status
text (string) - Extracted page text content
url (string) - URL that was scraped
status_code (integer) - HTTP status of the page
run_key (string) - Unique run identifier
base_url_for_links (string) - Base URL for resolving relative links
riveter_app_link (string) - Link to view run in Riveter dashboard
credit_used (integer) - Credits consumed

Examples

User: "Get all the product names and prices from this page"

orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/products","user_prompt":"Extract all products with name, price, and description"}'

User: "Scrape the team page and get everyone's info"

orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/about/team","user_prompt":"Extract team members: name, role, bio, photo URL, LinkedIn"}'

User: "What are Stripe's API pricing details?"

orth run olostep /v1/answers -d '{"task":"Find Stripe API pricing breakdown from stripe.com/pricing"}'

User: "Get all blog post titles and dates from this blog"

orth run riveter /v1/scrape -d '{"url":"https://blog.example.com","schema":{"posts":[{"title":"string","date":"string","url":"string"}]}}'

Error Handling

504 - Olostep timeout on slow pages — retry or try a simpler URL
400 - Missing required parameters (url_to_scrape for Olostep, website_url + user_prompt for Scrapegraph, url for Riveter)
Scrapegraph returns error field in response body — check it even on 200 status
Riveter returns request_status: "error" with details in message
Some sites block automated scraping — try a different API if one fails

Tips

Scrapegraph is best for natural language extraction
Riveter is best when you know the exact schema you want
Olostep is great for general scraping and AI answers
For dynamic sites (JavaScript-heavy), these tools handle rendering
Be specific in your prompts for better extraction results
Some sites may block automated access

orthogonal-sh/extract-webpage-data

skills/orthogonal-extract-webpage-data/SKILL.md

Extract structured data from web pages using AI

9 stars

development

Updated May 9, 2026

$ install --global

skillsauth

npx skillsauth add orthogonal-sh/skills extract-webpage-data

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 9, 2026, 7:14 AM251.4s1 file scanned

SKILL.md

name:: extract-webpage-data
description:: Extract structured data from web pages using AI

Extract Webpage Data

Extract structured data from any web page using AI. Turn messy HTML into clean, organized data.

When to Use

User wants to extract specific data from a website
User asks to scrape information from a page
User needs structured data from unstructured content
User wants to pull product info, contact details, etc.
Converting web content to usable data

How It Works

Uses Olostep, Scrapegraph, or Riveter APIs for AI-powered data extraction.

Usage

Simple Scrape with Olostep

orth run olostep /v1/scrapes -d '{"url_to_scrape":"https://example.com/products"}'

AI-Powered Extraction with Scrapegraph

orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/team","user_prompt":"Extract all team members with their names, titles, and LinkedIn URLs"}'

Schema-Based Extraction with Riveter

orth run riveter /v1/scrape -d '{"url":"https://example.com","schema":{"name":"string","price":"number","description":"string"}}'

Get AI Answer from Web

orth run olostep /v1/answers -d '{"task":"Find the pricing for Notion Teams plan from their website"}'

Crawl Multiple Pages

orth run olostep /v1/crawls -d '{"start_url":"https://example.com","max_pages":10}'

Parameters

Olostep Scrape

url_to_scrape (required) - URL to scrape
formats - Output formats (markdown, html, text)

Scrapegraph

website_url (required) - URL to scrape
user_prompt (required) - Natural language description of what to extract

Riveter

url (required) - URL to scrape
schema - JSON schema defining the data structure to extract

Olostep Answer

task (required) - Natural language task/question

Response

Olostep Response

Returns a scrape object:

id (string) - Scrape ID (e.g., scrape_z926lxxon3)
result.markdown_content (string|null) - Page content as markdown
result.html_content (string|null) - Raw HTML (if requested via formats)
result.text_content (string|null) - Plain text (if requested)
result.markdown_hosted_url (string|null) - S3 URL for large content
result.links_on_page (array) - Links found on the page
result.screenshot_hosted_url (string|null) - Screenshot URL (if requested)
result.page_metadata (object) - status_code of the page
credits_consumed (integer) - Credits used for this scrape

Async crawls: POST /v1/crawls returns an id. Poll with GET /v1/crawls/{id} until complete.

Scrapegraph Response

Returns structured extraction result:

request_id (string) - Unique request identifier
status (string) - completed or pending
result (object) - AI-extracted data matching your prompt (dynamic keys)
error (string) - Empty on success, error message on failure

Note: For large pages, the POST may return status: "pending". Poll with GET /v1/smartscraper/{request_id} until status is completed.

Riveter Response

Returns scrape result:

request_status (string) - success or error
message (string) - Human-readable status
text (string) - Extracted page text content
url (string) - URL that was scraped
status_code (integer) - HTTP status of the page
run_key (string) - Unique run identifier
base_url_for_links (string) - Base URL for resolving relative links
riveter_app_link (string) - Link to view run in Riveter dashboard
credit_used (integer) - Credits consumed

Examples

User: "Get all the product names and prices from this page"

orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/products","user_prompt":"Extract all products with name, price, and description"}'

User: "Scrape the team page and get everyone's info"

orth run scrapegraph /v1/smartscraper -d '{"website_url":"https://example.com/about/team","user_prompt":"Extract team members: name, role, bio, photo URL, LinkedIn"}'

User: "What are Stripe's API pricing details?"

orth run olostep /v1/answers -d '{"task":"Find Stripe API pricing breakdown from stripe.com/pricing"}'

User: "Get all blog post titles and dates from this blog"

orth run riveter /v1/scrape -d '{"url":"https://blog.example.com","schema":{"posts":[{"title":"string","date":"string","url":"string"}]}}'

Error Handling

504 - Olostep timeout on slow pages — retry or try a simpler URL
400 - Missing required parameters (url_to_scrape for Olostep, website_url + user_prompt for Scrapegraph, url for Riveter)
Scrapegraph returns error field in response body — check it even on 200 status
Riveter returns request_status: "error" with details in message
Some sites block automated scraping — try a different API if one fails

Tips

Scrapegraph is best for natural language extraction
Riveter is best when you know the exact schema you want
Olostep is great for general scraping and AI answers
For dynamic sites (JavaScript-heavy), these tools handle rendering
Be specific in your prompts for better extraction results
Some sites may block automated access

Related Skills

orthogonal-sh/yt-dlp-downloader

testing

VerifiedTrustedCommunity

Download videos from YouTube, Bilibili, Twitter, and thousands of other sites using yt-dlp. Use when the user provides a video URL and wants to download it, extract audio (MP3), download subtitles, or select video quality. Triggers on phrases like "下载视频", "download video", "yt-dlp", "YouTube", "B站", "抖音", "提取音频", "extract audio".

9SKILL.mdUpdated May 9, 2026

orthogonal-sh/yt-dlp-downloader

orthogonal-sh/slack

business

VerifiedTrustedCommunity

Send messages and manage Slack channels. Use when asked to send Slack messages, post to channels, list channels, or fetch message history.

9SKILL.mdUpdated May 9, 2026

orthogonal-sh/yc-batch-evaluator

development

VerifiedTrustedCommunity

Evaluate YC batch companies for investment — scrapes the YC directory, researches each company and its founders (work history, LinkedIn, website), assesses founder-company fit, and exports to Google Sheets with priority rankings. Use when asked to evaluate YC companies, research a YC batch, screen startups, or do due diligence on YC companies.

9SKILL.mdUpdated May 9, 2026

orthogonal-sh/yc-batch-evaluator

orthogonal-sh/website-screenshot

development

VerifiedTrustedCommunity

Take screenshots of websites and web pages

9SKILL.mdUpdated May 9, 2026

orthogonal-sh/website-screenshot

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/orthogonal-sh/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/skills/orthogonal-extract-webpage-data ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

orthogonal-sh/skills

9 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT