plugins/grabber-development/skills/grabber-development/SKILL.md
Comprehensive Python web scraping knowledge base covering stealth browser automation (Patchright, Camoufox, Nodriver), TLS/HTTP fingerprint impersonation (curl_cffi, primp), anti-bot bypass (Cloudflare, DataDome, PerimeterX), CAPTCHA solving, proxy architecture, AI-assisted extraction (Crawl4AI, Firecrawl, ScrapeGraphAI), framework selection (Scrapy, Crawlee), rate limiting, and production observability. TRIGGER WHEN: building, implementing, writing, coding, creating, optimizing, or debugging Python web scrapers. DO NOT TRIGGER WHEN: the task is outside the specific scope of this component.
npx skillsauth add acaprino/anvil-toolset grabber-developmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Knowledge base for building production-grade Python web scraping systems. Covers the full stack from target assessment through production observability.
This section overrides everything else in this skill if there is any conflict. Read it first, act on it first.
When this skill activates on a scraping task, your next non-question tool call MUST launch a visible browser with the capture surface attached. Not Write pyproject.toml. Not Write models.py. Not "let me sketch the architecture first". Browser first, then code.
The default path is user-driven navigation with live capture, not Claude-clicks. The user knows their data and their portal better than you do, and authenticated SaaS sites need them anyway. Steps:
playwright-skill (preferred) or write an inline Patchright script via Bash. The script must run with headless=False, attach every handler in the Capture Surface below, and park on input() waiting for the user.The Claude-drives variant is fine only when there is no login, no 2FA, and no UI-knowledge gap. Same launch, same capture handlers; you call page.goto / page.click yourself instead of parking on input().
Writing project files (pyproject.toml, src/<pkg>/..., models.py) before the capture is in your hands is the failure mode this section exists to prevent. If you catch yourself drafting field-name alias tuples from "common patterns" (Italian + English, REST conventions, framework defaults), stop and launch the browser instead.
The full capture surface, output checklist, and anti-patterns are in the Discovery Gate section below. Read those too. But the imperative is here: browser before code, every time, user navigates by default.
Phase 1 (Target Assessment) and Phase 2 (Data Discovery) are blocking gates, not optional steps. You MUST execute them yourself and have their concrete outputs in hand before scaffolding any project file (pyproject.toml, modules, models, CLI). No exceptions.
You always control the browser session and the capture. The deliverable of discovery is not a script you hand over; it is a live capture you watched. Always launch the browser yourself (via playwright-skill or inline Patchright) with headless=False and the full capture surface attached, and keep the session open inside your turn.
Who clicks depends on the task. The capture is yours either way:
input() checkpoint, let the user navigate while the network capture streams live, then dump the capture when they signal "done".scripts/discover.py and telling the user "run this and paste the output back". That breaks the loop: by the time the user runs it, you have no eyes on the session and no chance to ask "wait, click that filter again, I lost the payload".Capture surface (attach all of these from page launch):
page.on("request") / page.on("response") for XHR + fetch (URL, method, status, headers, cookies, request body, response body when JSON or text)page.on("websocket") then ws.on("framesent") / ws.on("framereceived") for WebSocket traffic in both directionstext/event-stream (SSE) and chunked transferpage.on("worker") for service-worker- and dedicated-worker-initiated requests/graphql, request body has operationName / variables / extensions.persistedQuery.sha256Hashcontext.cookies() after login, plus any anti-bot cookies (cf_clearance, __cf_bm, datadome, _px3, ak_bmsc, incap_ses)page.on("framenavigated") filtered to the main frame, to record every landing URL after redirectsRedact Authorization, Cookie, and password fields in anything saved to disk. Keep them in the in-memory capture you reason from.
Discovery outputs you MUST collect before scaffolding (treat as a checklist; if any item is still a guess, you have not finished discovery):
/#/... guesses)operationName and variables, persisted-query SHA if presentcf_clearance, __cf_bm, datadome, _px3, ak_bmsc, incap_ses present or absentIf any of those is still a guess, you have not finished discovery; do not proceed to scaffolding.
pyproject.toml and module skeleton before observing one real network request from the target/api/invoices, /#/fatture-ricevute) without observation(fatture|invoice|received|ricevute|passive)) as a substitute for the real endpoint nameField(alias=...) tuples of "Italian + English likely names" instead of the names actually returned by the APIdiscover.py script as the first discovery step when you could open the browser yourselfFor every scraping task, follow this sequence (the Discovery Gate above governs steps 1 and 2):
playwright-skill or inline Patchright)page.on("request") and page.on("response")<script> JSON before parsing DOMimpersonate="chrome" -- done| Target Profile | HTTP Client | Browser | Framework | |---------------|-------------|---------|-----------| | No JS, no protection | curl_cffi | none | Scrapy / httpx | | JS-rendered, no protection | none | Playwright | Crawlee | | Basic Cloudflare | curl_cffi + cf_clearance | Patchright (for cookie) | Scrapy | | Heavy Cloudflare | none | Patchright persistent | Crawlee | | DataDome | none | Camoufox + ghost-cursor | custom | | PerimeterX | none | Nodriver / Patchright | custom | | AI extraction needed | none | Crawl4AI / Firecrawl | standalone |
| Tier | Type | Price Range | Use When | |------|------|-------------|----------| | 0 | No proxy | free | Unprotected targets, development | | 1 | Datacenter | $0.10-0.50/GB | Light protection, high volume | | 2 | ISP (static residential) | $0.53-1.47/IP | Account management, login flows | | 3 | Residential | $0.49-8.00/GB | Anti-bot bypass, geo-targeting | | 4 | Mobile | $4-13/GB | Highest trust, last resort |
field-guide.md -- full 2025-2026 Python web scraping field guide covering browser stealth, TLS fingerprinting, behavioral biometrics, anti-bot bypass, CAPTCHA solving, proxy landscape, frameworks, AI-assisted scraping, GraphQL reverse engineering, rate limiting, and observabilitydevelopment
Unified web frontend knowledge base covering CSS architecture, UX psychology, UI components, distinctive aesthetics, and interface design generation. TRIGGER WHEN: working on web styling, design systems, component decisions, responsive strategy, distinctive frontend aesthetics, or exploring multiple interface designs. DO NOT TRIGGER WHEN: the task is purely backend or unrelated to web frontend.
development
Coordinate parallel code reviews across multiple quality dimensions with finding deduplication, severity calibration, and consolidated reporting. Use this skill when organizing multi-reviewer code reviews, calibrating finding severity, or consolidating review results.
tools
Knowledge base for the codebase-mapper plugin. Provides writing guidelines, tone rules, and diagram conventions for generating human-readable project guides. Referenced by all codebase-mapper agents during document generation. TRIGGER WHEN: referenced by codebase-mapper pipeline agents (codebase-explorer, overview-writer, tech-writer, flow-writer, onboarding-writer, ops-writer, config-writer, guide-reviewer) during document generation. DO NOT TRIGGER WHEN: outside the /map-codebase pipeline (general documentation work should use docs:readme-craft or codebase-mapper:docs-create).
tools
Progressive Web App knowledge base for 2025-2026: Web App Manifest, Service Workers (Workbox 7, Serwist), Web Push (VAPID, RFC 8030/8291/8292, Declarative Push for Safari 18.4+), install flows (beforeinstallprompt, Window Controls Overlay), OPFS storage, Project Fugu, Core Web Vitals (INP < 200ms), security (HTTPS, CSP, COOP/COEP), and distribution (Bubblewrap, PWA Builder MSIX, Capacitor). TRIGGER WHEN: building, auditing, or debugging PWAs, including manifest, service worker, Web Push, install flow, OPFS, Background Sync, Wake Lock, vite-plugin-pwa, Next.js Serwist, @angular/pwa, @vite-pwa/nuxt, Bubblewrap, TWA, PWA Builder, or Capacitor wrapping. DO NOT TRIGGER WHEN: the task is generic frontend styling (use frontend), React performance (use react-development:review-react), cross-platform security unrelated to PWA (use platform-engineering), Tauri or Electron wrappers (use tauri-development), or GA4 / analytics (use digital-marketing).