Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

openclaw/3511815125/web-fetch-vx

Name: 3511815125/web-fetch-vx
Author: openclaw

3511815125/web-fetch-vx/SKILL.md

npx skillsauth add openclaw/skills 3511815125/web-fetch-vx

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Web Content Extractor - 网页内容提取器

版本: 2.0
作者: OpenClaw Team
更新日期: 2026-03-15
许可证: MIT

📦 技能元数据

name: web-content-extractor
version: 2.0.0
description: 从微信文章/博客/新闻网页提取干净内容，去除广告和侧边栏
category: 内容处理
tags: [网页提取，内容清洗，微信文章，Markdown]
author: OpenClaw Team
license: MIT

🎯 功能概述

基于 Readability + Firecrawl + Defuddle 三引擎的网页内容提取工具，专为中文内容优化。支持微信文章、新闻网站、博客等多种来源，自动去除广告/导航/侧边栏，输出干净的 Markdown 格式。

核心能力：

✅ 微信文章提取（mp.weixin.qq.com）
✅ 新闻网页清洗
✅ 博客文章解析
✅ 元数据提取（标题/作者/日期）
✅ 多格式输出（Markdown/JSON/纯文本）
✅ 批量处理支持

🚀 快速开始

基础调用

# OpenClaw 工具调用
result = web_fetch(
    url="https://mp.weixin.qq.com/s/xxx",
    extractMode="markdown",
    maxChars=8000
)

完整参数

| 参数 | 类型 | 必填 | 默认值 | 说明 | |------|------|------|--------|------| | url | str | ✅ | - | 网页 URL | | extractMode | str | ❌ | "markdown" | 输出格式（markdown/text/json） | | maxChars | int | ❌ | 8000 | 最大字符数 | | includeMetadata | bool | ❌ | true | 是否包含元数据 | | timeout | int | ❌ | 30 | 超时时间（秒） |

📤 输入输出

输入示例

{
  "url": "https://mp.weixin.qq.com/s/abcdefg",
  "extractMode": "markdown",
  "maxChars": 8000,
  "includeMetadata": true
}

输出示例

{
  "success": true,
  "url": "https://mp.weixin.qq.com/s/abcdefg",
  "title": "文章标题",
  "author": "作者名",
  "publishDate": "2026-03-15",
  "content": "Markdown 格式的正文内容...",
  "wordCount": 2500,
  "readTime": "10 分钟",
  "images": ["https://..."],
  "extractTime": 0.8
}

🔧 技术架构

三引擎设计

                    用户请求
                       ↓
              ┌────────────────┐
              │   路由判断层    │
              └────────────────┘
                       ↓
        ┌──────────────┼──────────────┐
        ↓              ↓              ↓
   ┌─────────┐   ┌─────────┐   ┌─────────┐
   │ web_fetch│   │ defuddle│   │ browser │
   │ (快速)  │   │ (专业)  │   │ (兜底)  │
   └─────────┘   └─────────┘   └─────────┘
        ↓              ↓              ↓
              ┌────────────────┐
              │   结果聚合层    │
              └────────────────┘
                       ↓
                  返回用户

引擎对比

| 引擎 | 速度 | 成功率 | 适用场景 | |------|------|--------|----------| | web_fetch | <1s | 70% | 微信文章/通用网页 | | defuddle | <1s | 75% | 博客/新闻网站 | | browser | 5-10s | 90% | 复杂 SPA/动态页面 |

📋 使用场景

场景 1：微信文章提取

result = web_fetch(
    url="https://mp.weixin.qq.com/s/xxx",
    extractMode="markdown"
)
print(result["content"])

场景 2：批量处理

urls = ["url1", "url2", "url3"]
results = [web_fetch(url=u) for u in urls]

场景 3：带元数据提取

result = web_fetch(
    url="https://example.com/article",
    includeMetadata=True
)
print(f"标题：{result['title']}")
print(f"作者：{result['author']}")
print(f"字数：{result['wordCount']}")

⚠️ 限制与注意事项

不支持的场景

❌ 需要登录的页面
❌ 付费墙内容
❌ 验证码保护的页面
❌ 纯 JavaScript 渲染的 SPA（需用 browser 引擎）

速率限制

| 域名类型 | 请求间隔 | 并发限制 | |----------|----------|----------| | 微信文章 | 2 秒 | 1 | | 新闻网站 | 1 秒 | 3 | | 博客 | 1 秒 | 5 |

合规要求

仅提取公开可访问内容
尊重 robots.txt 协议
不用于商业用途（除非获得授权）
保留原作者署名

🎛️ 高级配置

自定义 User-Agent

result = web_fetch(
    url="https://example.com",
    userAgent="Mozilla/5.0 ..."
)

代理配置

result = web_fetch(
    url="https://example.com",
    proxy="http://proxy:port"
)

缓存控制

# 启用缓存（1 小时）
result = web_fetch(url, cache=True, ttl=3600)

# 强制刷新
result = web_fetch(url, cache=False)

📊 性能指标

| 指标 | 数值 | |------|------| | 平均响应时间 | 0.8 秒 | | P95 响应时间 | 2.5 秒 | | 成功率 | 85% | | 缓存命中率 | 60% |

🔍 故障排查

问题 1：提取内容为空

原因：页面需要 JavaScript 渲染
解决：切换到 browser 引擎

问题 2：微信文章提取失败

原因：链接过期或有反爬
解决：

检查链接是否有效
尝试 browser 引擎
手动复制内容

问题 3：提取内容不完整

原因：maxChars 限制
解决：增加 maxChars 参数或分页处理

📚 依赖项

{
  "readability": "^0.4.4",
  "firecrawl": "^1.0.0",
  "defuddle": "^3.0.0"
}

🤝 贡献指南

Fork 本仓库
创建功能分支 (git checkout -b feature/AmazingFeature)
提交更改 (git commit -m 'Add some AmazingFeature')
推送到分支 (git push origin feature/AmazingFeature)
开启 Pull Request

📄 许可证

MIT License - 详见 LICENSE

📞 支持

文档: https://docs.openclaw.ai/skills/web-content-extractor
问题反馈: https://github.com/openclaw/openclaw/issues
社区: https://discord.com/invite/clawd

最后更新: 2026-03-15
维护状态: ✅ 活跃维护

openclaw/3511815125/web-fetch-vx

3511815125/web-fetch-vx/SKILL.md

# Web Content Extractor - 网页内容提取器 **版本**: 2.0 **作者**: OpenClaw Team **更新日期**: 2026-03-15 **许可证**: MIT --- ## 📦 技能元数据 ```yaml name: web-content-extractor version: 2.0.0 description: 从微信文章/博客/新闻网页提取干净内容，去除广告和侧边栏 category: 内容处理 tags: [网页提取，内容清洗，微信文章，Markdown] author: OpenClaw Team license: MIT ``` --- ## 🎯 功能概述基于 Readability + Firecrawl + Defuddle 三引擎的网页内容提取工具，专为中文内容优化。支持微信文章、新闻网站、博客等多种来源，自动去除广告/导航/侧边栏，输出干净的 Markdown 格式。 **核心能力**： - ✅ 微信文章提取（mp.weixin.qq.com） - ✅ 新闻网页清洗 - ✅ 博客文章解析

3,729 stars

development

Updated Apr 2, 2026

$ install --global

skillsauth

npx skillsauth add openclaw/skills 3511815125/web-fetch-vx

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 3, 2026, 6:45 AM141.6s1 file scanned

SKILL.md

Web Content Extractor - 网页内容提取器

版本: 2.0
作者: OpenClaw Team
更新日期: 2026-03-15
许可证: MIT

📦 技能元数据

name: web-content-extractor
version: 2.0.0
description: 从微信文章/博客/新闻网页提取干净内容，去除广告和侧边栏
category: 内容处理
tags: [网页提取，内容清洗，微信文章，Markdown]
author: OpenClaw Team
license: MIT

🎯 功能概述

核心能力：

✅ 微信文章提取（mp.weixin.qq.com）
✅ 新闻网页清洗
✅ 博客文章解析
✅ 元数据提取（标题/作者/日期）
✅ 多格式输出（Markdown/JSON/纯文本）
✅ 批量处理支持

🚀 快速开始

基础调用

# OpenClaw 工具调用
result = web_fetch(
    url="https://mp.weixin.qq.com/s/xxx",
    extractMode="markdown",
    maxChars=8000
)

完整参数

📤 输入输出

输入示例

{
  "url": "https://mp.weixin.qq.com/s/abcdefg",
  "extractMode": "markdown",
  "maxChars": 8000,
  "includeMetadata": true
}

输出示例

{
  "success": true,
  "url": "https://mp.weixin.qq.com/s/abcdefg",
  "title": "文章标题",
  "author": "作者名",
  "publishDate": "2026-03-15",
  "content": "Markdown 格式的正文内容...",
  "wordCount": 2500,
  "readTime": "10 分钟",
  "images": ["https://..."],
  "extractTime": 0.8
}

🔧 技术架构

三引擎设计

                    用户请求
                       ↓
              ┌────────────────┐
              │   路由判断层    │
              └────────────────┘
                       ↓
        ┌──────────────┼──────────────┐
        ↓              ↓              ↓
   ┌─────────┐   ┌─────────┐   ┌─────────┐
   │ web_fetch│   │ defuddle│   │ browser │
   │ (快速)  │   │ (专业)  │   │ (兜底)  │
   └─────────┘   └─────────┘   └─────────┘
        ↓              ↓              ↓
              ┌────────────────┐
              │   结果聚合层    │
              └────────────────┘
                       ↓
                  返回用户

引擎对比

📋 使用场景

场景 1：微信文章提取

result = web_fetch(
    url="https://mp.weixin.qq.com/s/xxx",
    extractMode="markdown"
)
print(result["content"])

场景 2：批量处理

urls = ["url1", "url2", "url3"]
results = [web_fetch(url=u) for u in urls]

场景 3：带元数据提取

result = web_fetch(
    url="https://example.com/article",
    includeMetadata=True
)
print(f"标题：{result['title']}")
print(f"作者：{result['author']}")
print(f"字数：{result['wordCount']}")

⚠️ 限制与注意事项

不支持的场景

❌ 需要登录的页面
❌ 付费墙内容
❌ 验证码保护的页面
❌ 纯 JavaScript 渲染的 SPA（需用 browser 引擎）

速率限制

| 域名类型 | 请求间隔 | 并发限制 | |----------|----------|----------| | 微信文章 | 2 秒 | 1 | | 新闻网站 | 1 秒 | 3 | | 博客 | 1 秒 | 5 |

合规要求

仅提取公开可访问内容
尊重 robots.txt 协议
不用于商业用途（除非获得授权）
保留原作者署名

🎛️ 高级配置

自定义 User-Agent

result = web_fetch(
    url="https://example.com",
    userAgent="Mozilla/5.0 ..."
)

代理配置

result = web_fetch(
    url="https://example.com",
    proxy="http://proxy:port"
)

缓存控制

# 启用缓存（1 小时）
result = web_fetch(url, cache=True, ttl=3600)

# 强制刷新
result = web_fetch(url, cache=False)

📊 性能指标

| 指标 | 数值 | |------|------| | 平均响应时间 | 0.8 秒 | | P95 响应时间 | 2.5 秒 | | 成功率 | 85% | | 缓存命中率 | 60% |

🔍 故障排查

问题 1：提取内容为空

原因：页面需要 JavaScript 渲染
解决：切换到 browser 引擎

问题 2：微信文章提取失败

原因：链接过期或有反爬
解决：

检查链接是否有效
尝试 browser 引擎
手动复制内容

问题 3：提取内容不完整

原因：maxChars 限制
解决：增加 maxChars 参数或分页处理

📚 依赖项

{
  "readability": "^0.4.4",
  "firecrawl": "^1.0.0",
  "defuddle": "^3.0.0"
}

🤝 贡献指南

Fork 本仓库
创建功能分支 (git checkout -b feature/AmazingFeature)
提交更改 (git commit -m 'Add some AmazingFeature')
推送到分支 (git push origin feature/AmazingFeature)
开启 Pull Request

📄 许可证

MIT License - 详见 LICENSE

📞 支持

文档: https://docs.openclaw.ai/skills/web-content-extractor
问题反馈: https://github.com/openclaw/openclaw/issues
社区: https://discord.com/invite/clawd

最后更新: 2026-03-15
维护状态: ✅ 活跃维护

Related Skills

openclaw/mcdonalds-skill

tools

VerifiedTrustedCommunity

Use when the user wants to connect to, test, or use the McDonalds service at mcp.mcd.cn, including checking authentication, probing MCP endpoints, listing tools, or calling McDonalds MCP tools through a reusable local CLI.

3,962SKILL.mdUpdated Apr 10, 2026

openclaw/mcdonalds-skill

openclaw/scrapebadger

development

VerifiedTrustedCommunity

Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API

3,962SKILL.mdUpdated Apr 10, 2026

openclaw/scrapebadger

openclaw/slowmist-security-cc

development

VerifiedTrustedCommunity

SlowMist AI Agent Security Review — comprehensive security framework for skills, repositories, URLs, on-chain addresses, and products (Claude Code version)

3,962SKILL.mdUpdated Apr 10, 2026

openclaw/slowmist-security-cc

openclaw/humanizer-cn

data-ai

VerifiedTrustedCommunity

去除中文文本中的 AI 写作痕迹，使其读起来自然。基于维基百科 AI 写作特征指南，检测 24 种 AI 模式。触发词：humanizer-cn、去除 AI 痕迹、去除 AI 写作痕迹、中文文本人性化。

3,962SKILL.mdUpdated Apr 10, 2026

openclaw/humanizer-cn

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/openclaw/skills.git

# Copy into Claude Code skills folder (global)
cp -r skills/3511815125/web-fetch-vx ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

openclaw/skills

3,729 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT