skills/capabilities/web-archive-scraper/SKILL.md
Search the Wayback Machine for archived versions of websites. Extract cached pages, customer lists, testimonials, and partner directories from sites that have changed or gone offline. Uses the free CDX API — no API key needed.
npx skillsauth add gooseworks-ai/goose-skills web-archive-scraperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Search the Wayback Machine (Internet Archive) for archived snapshots of websites. Fetch cached page content to find customer lists, testimonials, partner directories, and other information from sites that have changed or shut down.
Only dependency is requests. No API key needed.
# Find all snapshots of a URL
python3 skills/web-archive-scraper/scripts/search_archive.py \
--url "https://botkeeper.com/customers"
# Search with date range
python3 skills/web-archive-scraper/scripts/search_archive.py \
--url "https://botkeeper.com" --from 2025-01-01 --to 2026-02-01
# Search all pages under a domain (prefix match)
python3 skills/web-archive-scraper/scripts/search_archive.py \
--url "https://botkeeper.com" --match prefix --limit 50
# Fetch the actual archived page content
python3 skills/web-archive-scraper/scripts/search_archive.py \
--url "https://botkeeper.com/customers" --fetch
# Output formats
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output json
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output csv
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output summary
web.archive.org/cdx/search/cdx for snapshots matching the URLid_ modifier to skip Wayback toolbar)| Flag | Default | Description |
|------|---------|-------------|
| --url | required | Target URL to search in the archive |
| --match | exact | Match type: exact, prefix, host, domain |
| --from | none | Start date (YYYY-MM-DD) |
| --to | none | End date (YYYY-MM-DD) |
| --limit | 25 | Max number of snapshots to return |
| --fetch | false | Fetch and display the content of the most recent snapshot |
| --fetch-all | false | Fetch content of ALL matched snapshots (use with small --limit) |
| --status | 200 | HTTP status filter (set to "any" to include all) |
| --output | json | Output format: json, csv, summary |
| --collapse | day | Dedup level: none, day, month, year |
{
"url": "https://botkeeper.com/customers",
"timestamp": "20250915143022",
"datetime": "2025-09-15T14:30:22",
"status_code": "200",
"mime_type": "text/html",
"archive_url": "https://web.archive.org/web/20250915143022/https://botkeeper.com/customers",
"raw_url": "https://web.archive.org/web/20250915143022id_/https://botkeeper.com/customers",
"content": "..."
}
The content field is only populated when --fetch or --fetch-all is used.
Free. The Wayback Machine CDX API requires no authentication or API key. Rate limit is ~15 requests/minute.
development
End-to-end skill that turns a single reference image into a fully-installed, example-rendered style preset for the goose-graphics composite. Analyzes the image, writes the slim style spec, registers it in styles/index.json, generates all 7 format examples using the standard brief, renders PNGs via Playwright, and updates examples/manifest.json. Invoke with /goose-graphics-create-style.
development
Evaluate YC batch companies for investment — scrapes the YC directory, researches each company and its founders (work history, LinkedIn, website), assesses founder-company fit, and exports to Google Sheets with priority rankings. Use when asked to evaluate YC companies, research a YC batch, screen startups, or do due diligence on YC companies.
tools
Take screenshots of any website using Notte browser automation. Use when asked to screenshot, capture, or snap a webpage.
development
Search the web, platforms, and datasets. Use when asked to search, find, look up, research, or discover information from the web, YouTube, Amazon, eBay, news, academic sources, or any online platform.