Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

gooseworks-ai/web-archive-scraper

Name: web-archive-scraper
Author: gooseworks-ai

skills/capabilities/web-archive-scraper/SKILL.md

npx skillsauth add gooseworks-ai/goose-skills web-archive-scraper

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Web Archive Scraper

Search the Wayback Machine (Internet Archive) for archived snapshots of websites. Fetch cached page content to find customer lists, testimonials, partner directories, and other information from sites that have changed or shut down.

Quick Start

Only dependency is requests. No API key needed.

# Find all snapshots of a URL
python3 skills/web-archive-scraper/scripts/search_archive.py \
  --url "https://botkeeper.com/customers"

# Search with date range
python3 skills/web-archive-scraper/scripts/search_archive.py \
  --url "https://botkeeper.com" --from 2025-01-01 --to 2026-02-01

# Search all pages under a domain (prefix match)
python3 skills/web-archive-scraper/scripts/search_archive.py \
  --url "https://botkeeper.com" --match prefix --limit 50

# Fetch the actual archived page content
python3 skills/web-archive-scraper/scripts/search_archive.py \
  --url "https://botkeeper.com/customers" --fetch

# Output formats
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output json
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output csv
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output summary

How It Works

CDX API search — Queries web.archive.org/cdx/search/cdx for snapshots matching the URL
Filtering — Filters by date range, HTTP status code, and MIME type
Dedup — Collapses to one snapshot per day by default to avoid redundant results
Content fetch — Optionally fetches the raw archived HTML (using id_ modifier to skip Wayback toolbar)
Text extraction — Strips HTML tags for readable text output when fetching content

CLI Reference

| Flag | Default | Description | |------|---------|-------------| | --url | required | Target URL to search in the archive | | --match | exact | Match type: exact, prefix, host, domain | | --from | none | Start date (YYYY-MM-DD) | | --to | none | End date (YYYY-MM-DD) | | --limit | 25 | Max number of snapshots to return | | --fetch | false | Fetch and display the content of the most recent snapshot | | --fetch-all | false | Fetch content of ALL matched snapshots (use with small --limit) | | --status | 200 | HTTP status filter (set to "any" to include all) | | --output | json | Output format: json, csv, summary | | --collapse | day | Dedup level: none, day, month, year |

Output Schema

{
  "url": "https://botkeeper.com/customers",
  "timestamp": "20250915143022",
  "datetime": "2025-09-15T14:30:22",
  "status_code": "200",
  "mime_type": "text/html",
  "archive_url": "https://web.archive.org/web/20250915143022/https://botkeeper.com/customers",
  "raw_url": "https://web.archive.org/web/20250915143022id_/https://botkeeper.com/customers",
  "content": "..."
}

The content field is only populated when --fetch or --fetch-all is used.

Cost

Free. The Wayback Machine CDX API requires no authentication or API key. Rate limit is ~15 requests/minute.

Common Use Cases

Find customer lists from shut-down companies (e.g., botkeeper.com)
Recover testimonials/case studies before a site redesign
Track how a competitor's messaging changed over time
Find partner directories that have been removed

gooseworks-ai/web-archive-scraper

skills/capabilities/web-archive-scraper/SKILL.md

Search the Wayback Machine for archived versions of websites. Extract cached pages, customer lists, testimonials, and partner directories from sites that have changed or gone offline. Uses the free CDX API — no API key needed.

455 stars

development

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add gooseworks-ai/goose-skills web-archive-scraper

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 3, 2026, 10:04 AM10.7s3 files scanned

SKILL.md

name:: web-archive-scraper
description:: >

Web Archive Scraper

Quick Start

Only dependency is requests. No API key needed.

# Find all snapshots of a URL
python3 skills/web-archive-scraper/scripts/search_archive.py \
  --url "https://botkeeper.com/customers"

# Search with date range
python3 skills/web-archive-scraper/scripts/search_archive.py \
  --url "https://botkeeper.com" --from 2025-01-01 --to 2026-02-01

# Search all pages under a domain (prefix match)
python3 skills/web-archive-scraper/scripts/search_archive.py \
  --url "https://botkeeper.com" --match prefix --limit 50

# Fetch the actual archived page content
python3 skills/web-archive-scraper/scripts/search_archive.py \
  --url "https://botkeeper.com/customers" --fetch

# Output formats
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output json
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output csv
python3 skills/web-archive-scraper/scripts/search_archive.py --url URL --output summary

How It Works

CDX API search — Queries web.archive.org/cdx/search/cdx for snapshots matching the URL
Filtering — Filters by date range, HTTP status code, and MIME type
Dedup — Collapses to one snapshot per day by default to avoid redundant results
Content fetch — Optionally fetches the raw archived HTML (using id_ modifier to skip Wayback toolbar)
Text extraction — Strips HTML tags for readable text output when fetching content

CLI Reference

Output Schema

{
  "url": "https://botkeeper.com/customers",
  "timestamp": "20250915143022",
  "datetime": "2025-09-15T14:30:22",
  "status_code": "200",
  "mime_type": "text/html",
  "archive_url": "https://web.archive.org/web/20250915143022/https://botkeeper.com/customers",
  "raw_url": "https://web.archive.org/web/20250915143022id_/https://botkeeper.com/customers",
  "content": "..."
}

The content field is only populated when --fetch or --fetch-all is used.

Cost

Free. The Wayback Machine CDX API requires no authentication or API key. Rate limit is ~15 requests/minute.

Common Use Cases

Find customer lists from shut-down companies (e.g., botkeeper.com)
Recover testimonials/case studies before a site redesign
Track how a competitor's messaging changed over time
Find partner directories that have been removed

Related Skills

gooseworks-ai/goose-graphics-create-style

development

VerifiedTrustedCommunity

End-to-end skill that turns a single reference image into a fully-installed, example-rendered style preset for the goose-graphics composite. Analyzes the image, writes the slim style spec, registers it in styles/index.json, generates all 7 format examples using the standard brief, renders PNGs via Playwright, and updates examples/manifest.json. Invoke with /goose-graphics-create-style.

600SKILL.mdUpdated Apr 28, 2026

gooseworks-ai/goose-graphics-create-style

gooseworks-ai/yc-batch-evaluator

development

VerifiedTrustedCommunity

Evaluate YC batch companies for investment — scrapes the YC directory, researches each company and its founders (work history, LinkedIn, website), assesses founder-company fit, and exports to Google Sheets with priority rankings. Use when asked to evaluate YC companies, research a YC batch, screen startups, or do due diligence on YC companies.

600SKILL.mdUpdated Apr 28, 2026

gooseworks-ai/yc-batch-evaluator

gooseworks-ai/website-screenshot-notte

tools

VerifiedTrustedCommunity

Take screenshots of any website using Notte browser automation. Use when asked to screenshot, capture, or snap a webpage.

600SKILL.mdUpdated Apr 28, 2026

gooseworks-ai/website-screenshot-notte

gooseworks-ai/web-search

development

VerifiedTrustedCommunity

Search the web, platforms, and datasets. Use when asked to search, find, look up, research, or discover information from the web, YouTube, Amazon, eBay, news, academic sources, or any online platform.

600SKILL.mdUpdated Apr 28, 2026

gooseworks-ai/web-search

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/gooseworks-ai/goose-skills.git

# Copy into Claude Code skills folder (global)
cp -r goose-skills/skills/capabilities/web-archive-scraper ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

gooseworks-ai/goose-skills

455 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT