skills/page-collect/SKILL.md
Extract structured resources (icons, metadata, text, forms, videos, social links) from any webpage using Playwright. Supports individual collectors via subcommands (icons, metadata, text, forms, videos, socials) or all at once. The icon collector classifies SVGs as icon/logo/image based on size and DOM context, optimizes them for EDS, and outputs to /icons/ for use with decorateIcons(). Use when migrating pages, auditing sites, or extracting assets.
npx skillsauth add catalan-adobe/skills page-collectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Extract structured resources from any webpage via Playwright.
| Subcommand | Purpose | Output |
|------------|---------|--------|
| all | Run all collectors | collection.json + assets |
| icons | SVGs, icon fonts, CSS icons → classified SVGs | icons/ + icons.json |
| metadata | Meta tags, OG, structured data | metadata.json |
| text | Body text, headings, word count | text.json |
| forms | Form structures, fields, actions | forms.json |
| videos | Video embeds, sources | videos.json |
| socials | Social media links | socials.json |
If CLAUDE_SKILL_DIR is set:
SCRIPT="${CLAUDE_SKILL_DIR}/scripts/page-collect.js"
Otherwise, find it:
SCRIPT="$(find ~/.claude -path "*/page-collect/scripts/page-collect.js" -type f 2>/dev/null | head -1)"
node "$SCRIPT" <subcommand> <url> [--output <dir>]
Default output: ./page-collect-output/
Playwright must be installed: npx playwright install chromium
The icon collector extracts SVGs from multiple sources:
<svg> elements<img> tags with .svg src or data:image/svg+xml URIsbackground-image SVG data URIs<use> sprite references (resolved to standalone SVGs)| Class | Criteria | Output |
|-------|----------|--------|
| icon | ≤ 48px, inside button/link/nav | /icons/{name}.svg |
| logo | Brand area, "logo" in class/alt/src | /icons/logo.svg |
| image | > 48px, standalone | Excluded |
Icons are named from DOM context (aria-label, class, ID). When no
meaningful name can be derived, they get icon-{n} with
nameConfidence: "low" in the manifest — review these and rename.
Each icon SVG is cleaned:
currentColor (icons only, not logos)For more details, read the collectors reference.
{
"url": "https://example.com",
"icons": [
{
"name": "search",
"class": "icon",
"source": "inline-svg",
"file": "icons/search.svg",
"nameConfidence": "high",
"context": "header button Search"
}
]
}
icons.json — rename any nameConfidence: "low" icons/icons/*.svg to the EDS project's /icons/ directory:iconname: notationdecorateIcons() in aem.js handles renderingall results:Review collection.json for a full resource inventory of the page.
When used as part of a header migration:
node "$SCRIPT" icons <source-url> --output <extraction-dir>icons.json and copies SVGs to /icons/nav.plain.html uses :iconname: for tools/utility iconsprogram.md notes available iconstools
Reduce a webpage to a structural skeleton with semantic tokens. Two-phase pipeline: Phase 1 injects a browser script that tokenizes content ({TEXT}, {HEADING:n}, {IMAGE:WxH}, {CTA:label}, {LINK:label}, {INPUT:type}, {VIDEO}, {ICON}). Phase 2 applies LLM structural reasoning to collapse repeated patterns ({REPEAT:N}), remove decorative wrappers, strip utility classes, and produce skeleton.html + manifest.json. Use when migrating pages to EDS, analyzing page structure, extracting page blueprints, or preparing input for GenAI block generation. Triggers on: reduce page, page skeleton, page blueprint, extract structure, tokenize page, page reduction, structural skeleton, reduce URL.
tools
Capture a spatial hierarchy of rendered DOM elements from any webpage. Injects a pre-built script via playwright-cli that walks the DOM, detects layout grids, extracts backgrounds, prunes invisible nodes, promotes elements rendered outside their DOM parent (overlays, fixed navs, modals), and tags overlay nodes with occlusion metadata. Returns three outputs: LLM-friendly indented text, structured JSON tree, and a nodeMap mapping positional IDs to CSS selectors with background and overlay data. Use before page decomposition, overlay detection, brand extraction, or any workflow that needs structured page analysis. Triggers on: visual tree, capture tree, page structure, page hierarchy, DOM tree, capture visual, page analysis, extract tree.
tools
Summarize any video by analyzing both audio and visuals. Downloads via yt-dlp, extracts transcript (YouTube captions or Whisper), pulls scene-detected keyframes, and produces a multimodal summary with clickable timestamped YouTube links. Use this skill whenever the user wants to summarize a YouTube video, digest a talk or tutorial, get notes from a video, extract key points from a recording, or says things like "tl;dw", "summarize this video", "what's in this video", or pastes a YouTube URL and asks for a summary. Also triggers for non-YouTube URLs that yt-dlp supports.
development
Design and build web UIs with Adobe Spectrum 2 design system. Applies S2 layout principles, visual hierarchy, spacing, and component composition to produce accessible interfaces. Outputs vanilla CSS with Spectrum tokens (static pages) or Spectrum Web Components (interactive apps). Recommends tier based on complexity. Covers sp-theme setup, side-effect imports, overlay system, form patterns, --mod-* token customization, and 14 critical gotchas. Use for: spectrum 2 web, SWC, sp-button, sp-theme, build UI with spectrum, S2 layout, spectrum application, adobe design system, web component form, spectrum overlay.