Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

21pounder/web-scrape

Name: web-scrape
Author: 21pounder

deepresearch/.claude/skills/web-scrape/SKILL.md

npx skillsauth add 21pounder/terminalagent web-scrape

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Web Scraping Skill v3.0

Usage

/web-scrape <url> [options]

Options:

--format=markdown|json|text - Output format (default: markdown)
--full - Include full page content (skip smart extraction)
--screenshot - Also save a screenshot
--scroll - Scroll to load dynamic content (infinite scroll pages)

Examples:

/web-scrape https://example.com/article
/web-scrape https://news.site.com/story --format=json
/web-scrape https://spa-app.com/page --scroll --screenshot

Execution Flow

Phase 1: Navigate and Load

1. mcp__playwright__browser_navigate
   url: "<target URL>"

2. mcp__playwright__browser_wait_for
   time: 2  (allow initial render)

If --scroll option: Execute scroll sequence to trigger lazy loading:

3. mcp__playwright__browser_evaluate
   function: "async () => {
     for (let i = 0; i < 3; i++) {
       window.scrollTo(0, document.body.scrollHeight);
       await new Promise(r => setTimeout(r, 1000));
     }
     window.scrollTo(0, 0);
   }"

Phase 2: Capture Content

4. mcp__playwright__browser_snapshot
   → Returns full accessibility tree with all text content

If --screenshot option:

5. mcp__playwright__browser_take_screenshot
   filename: "scraped_<domain>_<timestamp>.png"
   fullPage: true

Phase 3: Close Browser

6. mcp__playwright__browser_close

Smart Content Extraction

After getting the snapshot, apply intelligent extraction:

Step 1: Identify Content Type

| Page Type | Indicators | Extraction Strategy | |-----------|------------|---------------------| | Article/Blog | <article>, long paragraphs, date/author | Extract main article body | | Product Page | Price, "Add to Cart", specs | Extract title, price, description, specs | | Documentation | Code blocks, headings hierarchy | Preserve structure and code | | List/Search | Repeated item patterns | Extract as structured list | | Landing Page | Hero section, CTAs | Extract key messaging |

Step 2: Filter Noise

ALWAYS REMOVE these elements from output:

Navigation menus and breadcrumbs
Footer content (copyright, links)
Sidebars (ads, related articles, social links)
Cookie banners and popups
Comments section (unless specifically requested)
Share buttons and social widgets
Login/signup prompts

Step 3: Structure the Content

For Articles:

# [Title]

**Source:** [URL]
**Date:** [if available]
**Author:** [if available]

---

[Main content in clean markdown]

For Product Pages:

# [Product Name]

**Price:** [price]
**Availability:** [in stock/out of stock]

## Description
[product description]

## Specifications
| Spec | Value |
|------|-------|
| ... | ... |

Output Formats

Markdown (default)

Clean, readable markdown with proper headings, lists, and formatting.

JSON

{
  "url": "https://...",
  "title": "Page Title",
  "type": "article|product|docs|list",
  "content": {
    "main": "...",
    "metadata": {}
  },
  "extracted_at": "ISO timestamp"
}

Text

Plain text with minimal formatting, suitable for further processing.

Error Handling

Navigation Errors

| Error | Detection | Action | |-------|-----------|--------| | Timeout | Page doesn't load in 30s | Report error, suggest retry | | 404 Not Found | "404" in title/content | Report "Page not found" | | 403 Forbidden | "403", "Access Denied" | Report access restriction | | CAPTCHA | "captcha", "verify you're human" | Report CAPTCHA detected, cannot proceed | | Paywall | "subscribe", "premium content" | Extract visible content, note paywall |

Recovery Actions

If page load fails:
1. Report the specific error to user
2. Suggest: "Try again?" or "Different URL?"
3. Close browser cleanly

If content is blocked:
1. Report what was detected (CAPTCHA/paywall/geo-block)
2. Extract any available preview content
3. Suggest alternatives if applicable

Advanced Scenarios

Single Page Applications (SPA)

1. Navigate to URL
2. Wait longer (3-5 seconds) for JS hydration
3. Use browser_wait_for with specific text if known
4. Then snapshot

Infinite Scroll Pages

1. Navigate
2. Execute scroll loop (see Phase 1)
3. Snapshot after scrolling completes

Pages with Click-to-Reveal Content

1. Snapshot first to identify clickable elements
2. Use browser_click on "Read more" / "Show all" buttons
3. Wait briefly
4. Snapshot again for full content

Multi-page Articles

1. Scrape first page
2. Identify "Next" or pagination links
3. Ask user: "Article has X pages. Scrape all?"
4. If yes, iterate through pages and combine

Performance Guidelines

| Metric | Target | How | |--------|--------|-----| | Speed | < 15 seconds | Minimal waits, parallel where possible | | Token Usage | < 5000 tokens | Smart extraction, not full DOM | | Reliability | > 95% success | Proper error handling |

Security Notes

Never execute arbitrary JavaScript from the page
Don't follow redirects to suspicious domains
Don't submit forms or click login buttons
Don't scrape pages that require authentication (unless user provides credentials flow)
Respect robots.txt when mentioned by user

Quick Reference

Minimum viable scrape (4 tool calls):

1. browser_navigate → 2. browser_wait_for → 3. browser_snapshot → 4. browser_close

Full-featured scrape (with scroll + screenshot):

1. browser_navigate
2. browser_wait_for
3. browser_evaluate (scroll)
4. browser_snapshot
5. browser_take_screenshot
6. browser_close

Remember: The goal is to deliver clean, useful content to the user, not raw HTML/DOM dumps.

21pounder/web-scrape

deepresearch/.claude/skills/web-scrape/SKILL.md

Intelligent web scraper with content extraction, multiple output formats, and error handling

129 stars

development

Updated Mar 25, 2026

$ install --global

skillsauth

npx skillsauth add 21pounder/terminalagent web-scrape

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

70%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Mar 25, 2026, 3:09 AM181.2s2 files scanned

SKILL.md

name:: web-scrape
description:: Intelligent web scraper with content extraction, multiple output formats, and error handling
version:: 3.0.0

Web Scraping Skill v3.0

Usage

/web-scrape <url> [options]

Options:

--format=markdown|json|text - Output format (default: markdown)
--full - Include full page content (skip smart extraction)
--screenshot - Also save a screenshot
--scroll - Scroll to load dynamic content (infinite scroll pages)

Examples:

/web-scrape https://example.com/article
/web-scrape https://news.site.com/story --format=json
/web-scrape https://spa-app.com/page --scroll --screenshot

Execution Flow

Phase 1: Navigate and Load

1. mcp__playwright__browser_navigate
   url: "<target URL>"

2. mcp__playwright__browser_wait_for
   time: 2  (allow initial render)

If --scroll option: Execute scroll sequence to trigger lazy loading:

3. mcp__playwright__browser_evaluate
   function: "async () => {
     for (let i = 0; i < 3; i++) {
       window.scrollTo(0, document.body.scrollHeight);
       await new Promise(r => setTimeout(r, 1000));
     }
     window.scrollTo(0, 0);
   }"

Phase 2: Capture Content

4. mcp__playwright__browser_snapshot
   → Returns full accessibility tree with all text content

If --screenshot option:

5. mcp__playwright__browser_take_screenshot
   filename: "scraped_<domain>_<timestamp>.png"
   fullPage: true

Phase 3: Close Browser

6. mcp__playwright__browser_close

Smart Content Extraction

After getting the snapshot, apply intelligent extraction:

Step 1: Identify Content Type

Step 2: Filter Noise

ALWAYS REMOVE these elements from output:

Navigation menus and breadcrumbs
Footer content (copyright, links)
Sidebars (ads, related articles, social links)
Cookie banners and popups
Comments section (unless specifically requested)
Share buttons and social widgets
Login/signup prompts

Step 3: Structure the Content

For Articles:

# [Title]

**Source:** [URL]
**Date:** [if available]
**Author:** [if available]

---

[Main content in clean markdown]

For Product Pages:

# [Product Name]

**Price:** [price]
**Availability:** [in stock/out of stock]

## Description
[product description]

## Specifications
| Spec | Value |
|------|-------|
| ... | ... |

Output Formats

Markdown (default)

Clean, readable markdown with proper headings, lists, and formatting.

JSON

{
  "url": "https://...",
  "title": "Page Title",
  "type": "article|product|docs|list",
  "content": {
    "main": "...",
    "metadata": {}
  },
  "extracted_at": "ISO timestamp"
}

Text

Plain text with minimal formatting, suitable for further processing.

Error Handling

Navigation Errors

Recovery Actions

If page load fails:
1. Report the specific error to user
2. Suggest: "Try again?" or "Different URL?"
3. Close browser cleanly

If content is blocked:
1. Report what was detected (CAPTCHA/paywall/geo-block)
2. Extract any available preview content
3. Suggest alternatives if applicable

Advanced Scenarios

Single Page Applications (SPA)

1. Navigate to URL
2. Wait longer (3-5 seconds) for JS hydration
3. Use browser_wait_for with specific text if known
4. Then snapshot

Infinite Scroll Pages

1. Navigate
2. Execute scroll loop (see Phase 1)
3. Snapshot after scrolling completes

Pages with Click-to-Reveal Content

1. Snapshot first to identify clickable elements
2. Use browser_click on "Read more" / "Show all" buttons
3. Wait briefly
4. Snapshot again for full content

Multi-page Articles

1. Scrape first page
2. Identify "Next" or pagination links
3. Ask user: "Article has X pages. Scrape all?"
4. If yes, iterate through pages and combine

Performance Guidelines

Security Notes

Never execute arbitrary JavaScript from the page
Don't follow redirects to suspicious domains
Don't submit forms or click login buttons
Don't scrape pages that require authentication (unless user provides credentials flow)
Respect robots.txt when mentioned by user

Quick Reference

Minimum viable scrape (4 tool calls):

1. browser_navigate → 2. browser_wait_for → 3. browser_snapshot → 4. browser_close

Full-featured scrape (with scroll + screenshot):

1. browser_navigate
2. browser_wait_for
3. browser_evaluate (scroll)
4. browser_snapshot
5. browser_take_screenshot
6. browser_close

Remember: The goal is to deliver clean, useful content to the user, not raw HTML/DOM dumps.

Related Skills

21pounder/pdf-analyze

tools

VerifiedTrustedCommunity

Comprehensive PDF manipulation toolkit for extracting text and tables, creating new PDFs, merging/splitting documents, and handling forms. When Claude needs to fill in a PDF form or programmatically process, generate, or analyze PDF documents at scale.

129SKILL.mdUpdated Mar 25, 2026

21pounder/pdf-analyze

21pounder/git-commit

tools

VerifiedTrustedCommunity

Use this skill when user asks to "commit changes", "create a commit", "stage and commit", or wants help with git commit workflow.

129SKILL.mdUpdated Mar 25, 2026

21pounder/deep-research

development

VerifiedTrustedCommunity

Conduct comprehensive deep research on any topic using Dify-powered workflow - searches documentation, academic papers, tutorials, APIs, best practices, and returns structured analysis with insights.

129SKILL.mdUpdated Mar 25, 2026

21pounder/deep-research

21pounder/code-review

development

VerifiedTrustedCommunity

Use this skill when user asks to "review code", "check for issues", "analyze code quality", "find bugs", or wants feedback on code implementation.

129SKILL.mdUpdated Mar 25, 2026

21pounder/code-review

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/21pounder/terminalagent.git

# Copy into Claude Code skills folder (global)
cp -r terminalagent/deepresearch/.claude/skills/web-scrape ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

21pounder/terminalagent

129 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT