skills/clean-content-fetch/SKILL.md
获取干净、可读的网页正文内容,适合现代网页、博客、新闻、公告和微信公众号文章抓取;支持网页正文提取、内容清洗、去噪、Markdown 输出,适用于普通 fetch 效果不佳、页面噪音较多或动态渲染干扰的场景。Clean content fetch for modern web pages, article extraction, WeChat article capture, content cleanup, noise reduction, and markdown output when ordinary fetch is not clean enough.
npx skillsauth add leoyeai/openclaw-master-skills clean-content-fetchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
当用户要获取网页内容、正文提取、把网页转成 markdown/text、抓取文章主体时,优先使用此技能。
python3 scripts/scrapling_fetch.py <url> <max_chars>articlemain.post-content[class*="body"]html2text 转 Markdownbodymax_chars 截断输出python3 /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py <url> 30000
优先检查:
scraplinghtml2textcurl_cffiplaywrightbrowserforge推荐使用独立虚拟环境,避免系统 Python 的 PEP 668 限制:
python3 -m venv /Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch
/Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/pip install scrapling html2text curl_cffi playwright browserforge
/Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python -m playwright install chromium
如直接运行脚本,优先使用该虚拟环境中的 Python:
/Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py <url> 30000
脚本默认输出 Markdown 正文内容。
如需结构化输出,可追加 --json。
如需调试提取命中了哪个 selector,可查看 stderr 输出。
/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/usage.md/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/references/selectors.md/Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/fetch-web-content对于 xhslink.com 短链或小红书笔记页,推荐直接使用虚拟环境中的脚本运行:
/Users/zzd/.openclaw/workspace/.venvs/clean-content-fetch/bin/python /Users/zzd/.openclaw/workspace/skills/scrapling-web-fetch/scripts/scrapling_fetch.py 'http://xhslink.com/o/9745hugimlD' 30000
说明:
testing
AI-powered diary generation for agents - creates rich, reflective journal entries (400-600 words) with Quote Hall of Fame, Curiosity Backlog, Decision Archaeology, Relationship Evolution, mood analytics, weekly digests, "On This Day" resurfacing, and cron auto-generation. Works best with Claude models (Haiku, Sonnet, Opus).
development
Multi-agent UX for OpenClaw Control UI — agent selector, per-agent sessions, session history viewer with search, agent-filtered Sessions tab with friendly names, Create Agent wizard, emoji picker, and backend agent CRUD.
tools
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
tools
Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.