0xcjl/web-reader-pro/SKILL.md
Advanced web content extraction skill for OpenClaw using multi-tier fallback strategy (Jina → Scrapling → WebFetch) with intelligent routing, caching, quality scoring, and domain learning. Use when: reading article content, extracting web page text, scraping dynamic JS-heavy pages, or fetching WeChat official account articles.
npx skillsauth add openclaw/skills web-reader-proInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Web Reader Pro is an advanced web content extraction skill for OpenClaw that uses a multi-tier fallback strategy with intelligent routing, caching, and quality assessment.
# Install dependencies
pip install -r requirements.txt
# Install Scrapling (requires Node.js)
./scripts/install_scrapling.sh
# Or install Scrapling manually
npm install -g @scrapinghub/scrapling
from scripts.web_reader_pro import WebReaderPro
reader = WebReaderPro()
result = reader.fetch("https://example.com")
print(result['title'])
print(result['content'])
reader = WebReaderPro(
jina_api_key="your-jina-key", # Optional: set via env JINA_API_KEY
cache_ttl=3600, # Cache TTL in seconds (default: 3600)
quality_threshold=200, # Min word count for quality (default: 200)
max_retries=3, # Max retries per tier (default: 3)
enable_learning=True, # Enable domain learning (default: True)
scrapling_path="/usr/local/bin/scrapling" # Path to scrapling binary
)
{
"title": "Page Title",
"content": "Extracted content in markdown...",
"url": "https://example.com",
"tier_used": "jina|scrapling|webfetch",
"quality_score": 85,
"cached": False,
"domain_learned_tier": "jina",
"extracted_at": "2024-01-01T00:00:00Z"
}
| Variable | Description | Default |
|----------|-------------|---------|
| JINA_API_KEY | Jina Reader API key | Required for Tier 1 |
| WEB_READER_CACHE_DIR | Cache directory path | ~/.openclaw/cache/web-reader-pro/ |
| WEB_READER_LEARNING_DB | Learning database path | ~/.openclaw/data/web-reader-pro/routes.json |
| WEB_READER_JINA_QUOTA | Jina quota limit | 100000 |
Fetch and extract content from a URL.
Parameters:
url (str): Target URLforce_refresh (bool): Bypass cache if TrueReturns: Dict with title, content, metadata
Fetch using a specific tier (bypassing automatic selection).
Parameters:
url (str): Target URLpreferred_tier (str): "jina", "scrapling", or "webfetch"Get current Jina API quota usage.
Returns: Dict with count, limit, percentage, warnings
Clear cache for specific URL or all URLs.
Parameters:
url (str, optional): Specific URL to clear, or None for allGet learned domain-to-tier mappings.
Returns: Dict of domain -> preferred tier
| Tier | Speed | JS Rendering | Best For | Cost | |------|-------|--------------|----------|------| | Jina | Fast | No | Static pages, articles | API calls | | Scrapling | Medium | Yes | SPAs, dynamic content | CPU | | WebFetch | Fastest | No | Simple pages, fallbacks | Free |
MIT
tools
Use when the user wants to connect to, test, or use the McDonalds service at mcp.mcd.cn, including checking authentication, probing MCP endpoints, listing tools, or calling McDonalds MCP tools through a reusable local CLI.
development
Web scraping platform — Twitter/X data, Vinted marketplace, and general web scraping API
development
SlowMist AI Agent Security Review — comprehensive security framework for skills, repositories, URLs, on-chain addresses, and products (Claude Code version)
data-ai
去除中文文本中的 AI 写作痕迹,使其读起来自然。基于维基百科 AI 写作特征指南,检测 24 种 AI 模式。触发词:humanizer-cn、去除 AI 痕迹、去除 AI 写作痕迹、中文文本人性化。