skills/capabilities/web-scraping-olostep/SKILL.md
Web scraping, crawling, and AI-powered answer extraction at scale
npx skillsauth add gooseworks-ai/goose-skills web-scraping-olostepInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Read your credentials from ~/.gooseworks/credentials.json:
export GOOSEWORKS_API_KEY=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json'))['api_key'])")
export GOOSEWORKS_API_BASE=$(python3 -c "import json;print(json.load(open('$HOME/.gooseworks/credentials.json')).get('api_base','https://api.gooseworks.ai'))")
If ~/.gooseworks/credentials.json does not exist, tell the user to run: npx gooseworks login
All endpoints use Bearer auth: -H "Authorization: Bearer $GOOSEWORKS_API_KEY"
Powerful web scraping, crawling, and AI-powered content extraction.
Initiate a web page scrape
Parameters:
default, none, arraypostlight, nonecurl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/scrapes","body":{"url_to_scrape":"https://example.com/page"}}'
The AI will perform actions like searching and browsing web pages to find the answer to the provided task. Execution time is 3-30s depending upon complexity. For longer tasks, use the agent endpoint instead.
Parameters:
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/answers","body":{"task":"What are the latest AI developments?"}}'
This endpoint allows users to get all the urls on a certain website. It can take up to 120 seconds for complex websites. For large websites, results are paginated using cursor-based pagination
Parameters:
true by default./blog/** to only include blog URLs. Only URLs matching these patterns will be returned./careers/**. Excluded URLs will supersede included URLs.curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/maps","body":{"url":"https://example.com"}}'
Starts a new crawl. You receive a id to track the progress. The operation may take 1-10 mins depending upon the site and depth and pages parameters.
Parameters:
/** which includes all URLs. Use patterns like /blog/** to crawl specific sections (e.g., only blog pages), /products/*.html for product pages, or multiple patterns for different sections. Supports standard glob features like * (any characters) and ** (recursive matching)./careers/**. Excluded URLs will supersede included URLs.false by default.v1/crawls/{crawl_id} endpoint.curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/crawls"}'
"start_url": "https://example.com",
"max_pages": 100
}'
Starts a new batch. You receive an id that you can use to track the progress of the batch as shown here. Note: Processing time is constant regardless of batch size
Parameters:
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/batches"}'
"items": [
{"url_to_scrape": "https://example.com/page1"},
{"url_to_scrape": "https://example.com/page2"}
]
}'
Retrieves the list of items processed for a batch. You can then use the retrieve_id to get the content with the Retrieve Endpoint
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/batches/{batch_id}/items"}'
Fetches information about a specific crawl.
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/crawls/{crawl_id}"}'
Fetches the list of pages for a specific crawl.
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/crawls/{crawl_id}/pages"}'
This endpoint retrieves a previously completed answer by its ID.
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/answers/{answer_id}"}'
Can be used to retrieve response for a scrape.
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/scrapes/{scrape_id}"}'
Retrieves the status and progress information about a batch. To retrieve the content for a batch, see here
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/batches/{batch_id}"}'
Retrieve page content of processed batches and crawls urls.
Parameters:
/v1/crawls/{crawl_id}/pages, /v1/scrapes/{scrape_id} or /v1/batches/{batch_id}/items endpointscurl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/run \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/retrieve","body":{"retrieve_id":"abc123"}}'
For full endpoint details and parameters:
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/search \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt":"olostep API endpoints"}' List all endpoints
curl -s -X POST $GOOSEWORKS_API_BASE/v1/proxy/orthogonal/details \
-H "Authorization: Bearer $GOOSEWORKS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"api":"olostep","path":"/v1/scrapes"}' # Get endpoint details
development
End-to-end skill that turns a single reference image into a fully-installed, example-rendered style preset for the goose-graphics composite. Analyzes the image, writes the slim style spec, registers it in styles/index.json, generates all 7 format examples using the standard brief, renders PNGs via Playwright, and updates examples/manifest.json. Invoke with /goose-graphics-create-style.
development
Evaluate YC batch companies for investment — scrapes the YC directory, researches each company and its founders (work history, LinkedIn, website), assesses founder-company fit, and exports to Google Sheets with priority rankings. Use when asked to evaluate YC companies, research a YC batch, screen startups, or do due diligence on YC companies.
tools
Take screenshots of any website using Notte browser automation. Use when asked to screenshot, capture, or snap a webpage.
development
Search the web, platforms, and datasets. Use when asked to search, find, look up, research, or discover information from the web, YouTube, Amazon, eBay, news, academic sources, or any online platform.