Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

athina-ai/orthogonal-extract-webpage-data

Name: orthogonal-extract-webpage-data
Author: athina-ai

skills/capabilities/orthogonal-extract-webpage-data/SKILL.md

npx skillsauth add athina-ai/goose-skills orthogonal-extract-webpage-data

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Extract Webpage Data

Setup

Read your credentials from ~/.gooseworks/credentials.json:

export GOOSEWORKS_API_KEY=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json'))['api_key'])")
export GOOSEWORKS_API_BASE=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json')).get('api_base','https://api.gooseworks.ai'))")

If ~/.gooseworks/credentials.json does not exist, tell the user to run: npx gooseworks login

All endpoints use Bearer auth: -H "Authorization: Bearer $GOOSEWORKS_API_KEY"

Extract structured data from any web page using AI. Turn messy HTML into clean, organized data.

When to Use

User wants to extract specific data from a website
User asks to scrape information from a page
User needs structured data from unstructured content
User wants to pull product info, contact details, etc.
Converting web content to usable data

How It Works

Uses Olostep, Scrapegraph, or Riveter APIs for AI-powered data extraction.

Usage

Simple Scrape with Olostep

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/scrapes","body":{"url_to_scrape":"https://example.com/products"}}'

AI-Powered Extraction with Scrapegraph

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/smartscraper","body":{"website_url":"https://example.com/team","user_prompt":"Extract all team members with their names, titles, and LinkedIn URLs"}}'

Schema-Based Extraction with Riveter

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/scrape","body":{"url":"https://example.com","schema":{"name":"string","price":"number","description":"string"}}}'

Get AI Answer from Web

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/answers","body":{"task":"Find the pricing for Notion Teams plan from their website"}}'

Crawl Multiple Pages

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/crawls","body":{"start_url":"https://example.com","max_pages":10}}'

Parameters

Olostep Scrape

url_to_scrape (required) - URL to scrape
formats - Output formats (markdown, html, text)

Scrapegraph

website_url (required) - URL to scrape
user_prompt (required) - Natural language description of what to extract

Riveter

url (required) - URL to scrape
schema - JSON schema defining the data structure to extract

Olostep Answer

task (required) - Natural language task/question

Response

Olostep Response

Returns a scrape object:

id (string) - Scrape ID (e.g., scrape_z926lxxon3)
result.markdown_content (string|null) - Page content as markdown
result.html_content (string|null) - Raw HTML (if requested via formats)
result.text_content (string|null) - Plain text (if requested)
result.markdown_hosted_url (string|null) - S3 URL for large content
result.links_on_page (array) - Links found on the page
result.screenshot_hosted_url (string|null) - Screenshot URL (if requested)
result.page_metadata (object) - status_code of the page
credits_consumed (integer) - Credits used for this scrape

Async crawls: POST /v1/crawls returns an id. Poll with GET /v1/crawls/{id} until complete.

Scrapegraph Response

Returns structured extraction result:

request_id (string) - Unique request identifier
status (string) - completed or pending
result (object) - AI-extracted data matching your prompt (dynamic keys)
error (string) - Empty on success, error message on failure

Note: For large pages, the POST may return status: "pending". Poll with GET /v1/smartscraper/{request_id} until status is completed.

Riveter Response

Returns scrape result:

request_status (string) - success or error
message (string) - Human-readable status
text (string) - Extracted page text content
url (string) - URL that was scraped
status_code (integer) - HTTP status of the page
run_key (string) - Unique run identifier
base_url_for_links (string) - Base URL for resolving relative links
riveter_app_link (string) - Link to view run in Riveter dashboard
credit_used (integer) - Credits consumed

Examples

User: "Get all the product names and prices from this page"

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/smartscraper","body":{"website_url":"https://example.com/products","user_prompt":"Extract all products with name, price, and description"}}'

User: "Scrape the team page and get everyone's info"

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/smartscraper","body":{"website_url":"https://example.com/about/team","user_prompt":"Extract team members: name, role, bio, photo URL, LinkedIn"}}'

User: "What are Stripe's API pricing details?"

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/answers","body":{"task":"Find Stripe API pricing breakdown from stripe.com/pricing"}}'

User: "Get all blog post titles and dates from this blog"

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/scrape","body":{"url":"https://blog.example.com","schema":{"posts":[{"title":"string","date":"string","url":"string"}]}}}'

Error Handling

504 - Olostep timeout on slow pages — retry or try a simpler URL
400 - Missing required parameters (url_to_scrape for Olostep, website_url + user_prompt for Scrapegraph, url for Riveter)
Scrapegraph returns error field in response body — check it even on 200 status
Riveter returns request_status: "error" with details in message
Some sites block automated scraping — try a different API if one fails

Tips

Scrapegraph is best for natural language extraction
Riveter is best when you know the exact schema you want
Olostep is great for general scraping and AI answers
For dynamic sites (JavaScript-heavy), these tools handle rendering
Be specific in your prompts for better extraction results
Some sites may block automated access

athina-ai/orthogonal-extract-webpage-data

skills/capabilities/orthogonal-extract-webpage-data/SKILL.md

Extract structured data from web pages using AI

459 stars

development

Updated Apr 22, 2026

$ install --global

skillsauth

npx skillsauth add athina-ai/goose-skills orthogonal-extract-webpage-data

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 11:13 AM5.6s2 files scanned

SKILL.md

name:: orthogonal-extract-webpage-data
description:: Extract structured data from web pages using AI
source:: orthogonal

Extract Webpage Data

Setup

Read your credentials from ~/.gooseworks/credentials.json:

export GOOSEWORKS_API_KEY=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json'))['api_key'])")
export GOOSEWORKS_API_BASE=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json')).get('api_base','https://api.gooseworks.ai'))")

If ~/.gooseworks/credentials.json does not exist, tell the user to run: npx gooseworks login

All endpoints use Bearer auth: -H "Authorization: Bearer $GOOSEWORKS_API_KEY"

Extract structured data from any web page using AI. Turn messy HTML into clean, organized data.

When to Use

User wants to extract specific data from a website
User asks to scrape information from a page
User needs structured data from unstructured content
User wants to pull product info, contact details, etc.
Converting web content to usable data

How It Works

Uses Olostep, Scrapegraph, or Riveter APIs for AI-powered data extraction.

Usage

Simple Scrape with Olostep

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/scrapes","body":{"url_to_scrape":"https://example.com/products"}}'

AI-Powered Extraction with Scrapegraph

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/smartscraper","body":{"website_url":"https://example.com/team","user_prompt":"Extract all team members with their names, titles, and LinkedIn URLs"}}'

Schema-Based Extraction with Riveter

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/scrape","body":{"url":"https://example.com","schema":{"name":"string","price":"number","description":"string"}}}'

Get AI Answer from Web

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/answers","body":{"task":"Find the pricing for Notion Teams plan from their website"}}'

Crawl Multiple Pages

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/crawls","body":{"start_url":"https://example.com","max_pages":10}}'

Parameters

Olostep Scrape

url_to_scrape (required) - URL to scrape
formats - Output formats (markdown, html, text)

Scrapegraph

website_url (required) - URL to scrape
user_prompt (required) - Natural language description of what to extract

Riveter

url (required) - URL to scrape
schema - JSON schema defining the data structure to extract

Olostep Answer

task (required) - Natural language task/question

Response

Olostep Response

Returns a scrape object:

id (string) - Scrape ID (e.g., scrape_z926lxxon3)
result.markdown_content (string|null) - Page content as markdown
result.html_content (string|null) - Raw HTML (if requested via formats)
result.text_content (string|null) - Plain text (if requested)
result.markdown_hosted_url (string|null) - S3 URL for large content
result.links_on_page (array) - Links found on the page
result.screenshot_hosted_url (string|null) - Screenshot URL (if requested)
result.page_metadata (object) - status_code of the page
credits_consumed (integer) - Credits used for this scrape

Async crawls: POST /v1/crawls returns an id. Poll with GET /v1/crawls/{id} until complete.

Scrapegraph Response

Returns structured extraction result:

request_id (string) - Unique request identifier
status (string) - completed or pending
result (object) - AI-extracted data matching your prompt (dynamic keys)
error (string) - Empty on success, error message on failure

Note: For large pages, the POST may return status: "pending". Poll with GET /v1/smartscraper/{request_id} until status is completed.

Riveter Response

Returns scrape result:

request_status (string) - success or error
message (string) - Human-readable status
text (string) - Extracted page text content
url (string) - URL that was scraped
status_code (integer) - HTTP status of the page
run_key (string) - Unique run identifier
base_url_for_links (string) - Base URL for resolving relative links
riveter_app_link (string) - Link to view run in Riveter dashboard
credit_used (integer) - Credits consumed

Examples

User: "Get all the product names and prices from this page"

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/smartscraper","body":{"website_url":"https://example.com/products","user_prompt":"Extract all products with name, price, and description"}}'

User: "Scrape the team page and get everyone's info"

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"scrapegraph","path":"/v1/smartscraper","body":{"website_url":"https://example.com/about/team","user_prompt":"Extract team members: name, role, bio, photo URL, LinkedIn"}}'

User: "What are Stripe's API pricing details?"

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"olostep","path":"/v1/answers","body":{"task":"Find Stripe API pricing breakdown from stripe.com/pricing"}}'

User: "Get all blog post titles and dates from this blog"

curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
  -H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"api":"riveter","path":"/v1/scrape","body":{"url":"https://blog.example.com","schema":{"posts":[{"title":"string","date":"string","url":"string"}]}}}'

Error Handling

504 - Olostep timeout on slow pages — retry or try a simpler URL
400 - Missing required parameters (url_to_scrape for Olostep, website_url + user_prompt for Scrapegraph, url for Riveter)
Scrapegraph returns error field in response body — check it even on 200 status
Riveter returns request_status: "error" with details in message
Some sites block automated scraping — try a different API if one fails

Tips

Scrapegraph is best for natural language extraction
Riveter is best when you know the exact schema you want
Olostep is great for general scraping and AI answers
For dynamic sites (JavaScript-heavy), these tools handle rendering
Be specific in your prompts for better extraction results
Some sites may block automated access

Related Skills

athina-ai/video-polish

content-media

VerifiedTrustedCommunity

Takes an existing screen recording or demo video and adds professional zoom/pan effects synchronized to the narration. Uses transcript-driven zoom targeting and Remotion for rendering. Optionally replaces audio with a soundtrack.

507SKILL.mdUpdated Apr 25, 2026

athina-ai/video-polish

athina-ai/video-clipper

tools

VerifiedTrustedCommunity

Repurposes long-form video (podcasts, interviews, talks) into short-form vertical clips for Instagram Reels, TikTok, and YouTube Shorts. Handles transcription, moment selection, clip extraction, speaker-tracked reframing (16:9 to 9:16), and animated captions.

507SKILL.mdUpdated Apr 25, 2026

athina-ai/video-clipper

athina-ai/talking-head-video

development

VerifiedTrustedCommunity

Creates talking head videos from any source material (docs, changelogs, blog posts, notes, transcripts). Produces multi-scene videos with avatar narration over screenshots/images using HeyGen v2 API. Supports Quick Shot and Full Producer modes.

507SKILL.mdUpdated Apr 25, 2026

athina-ai/talking-head-video

athina-ai/product-reel-generator

tools

VerifiedTrustedCommunity

Generates Instagram-ready product reels from any e-commerce product page URL. Scrapes product images, classifies by type, generates AI-animated clips via Higgsfield API, creates text overlays with style presets, and composes a 15-20 second reel with music. Supports model-based and product-only reels.

507SKILL.mdUpdated Apr 25, 2026

athina-ai/product-reel-generator

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/athina-ai/goose-skills.git

# Copy into Claude Code skills folder (global)
cp -r goose-skills/skills/capabilities/orthogonal-extract-webpage-data ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

athina-ai/goose-skills

459 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT