Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

mulatta/crwl-cli

Name: crwl-cli
Author: mulatta

skills/crwl-cli/SKILL.md

npx skillsauth add mulatta/skillz crwl-cli

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Workflow Selection

Choose approach before crawling:

| Situation | Approach | | ----------------------------------------------------------------------------------- | ------------------------------------------------- | | Single page (article, docs, blog post) | crwl-cli fetch URL | | Multiple pages linked from one page (product listings, search results, index pages) | JSON links pipeline (see Multi-step Crawling) |

NEVER manually copy URLs from markdown output. Use --format json and extract .links with jq instead. Markdown text may contain malformed or incomplete URLs, while .links provides structured, reliable hrefs.

Basic Usage

# Single URL — markdown output (default)
crwl-cli fetch https://docs.python.org/3/library/asyncio.html

# CSS selector to limit scope
crwl-cli fetch https://docs.python.org/3/ --css "#content"

# JSON output (for pipelines)
crwl-cli fetch https://example.com --format json

# Raw markdown (no content filtering)
crwl-cli fetch https://example.com --format raw

# Fast mode — disable images
crwl-cli fetch https://example.com --text-mode

# Wait for dynamic content
crwl-cli fetch https://example.com --wait-for ".loaded"

# Batch crawl from file
crwl-cli fetch --urls-file urls.txt --format json

Multi-step Crawling

When to use: the target page links to multiple detail pages you need data from. Detect this when the page is a product listing, search results, category index, or configurator.

Steps

# 1. Crawl listing page → JSON (always use --format json for listings)
crwl-cli fetch https://shop.example.com/products --format json > listing.json

# 2. Extract detail page URLs via .links (NOT from .markdown)
jq -r '.links.internal[] | select(.href | test("/products/")) | .href' listing.json > urls.txt

# 3. Batch crawl all detail pages
crwl-cli fetch --urls-file urls.txt --format json

`links` structure (`--format json` only)

{
  "internal": [{ "href": "...", "text": "...", "title": "..." }],
  "external": [{ "href": "...", "text": "...", "title": "..." }]
}

Agent decision logic

Crawl the target URL with --format json
Check: does .links.internal contain multiple URLs matching a detail page pattern?
- Yes → filter with jq, write to file, batch crawl with --urls-file
- No → use .markdown directly
Extract needed information from each result's markdown

Anti-patterns

Reading .markdown to find URLs — unreliable, manual, and misses links hidden in JS-rendered elements. Always use .links.
Manually constructing detail page URLs — fragile if URL scheme changes. Let .links provide the canonical hrefs.
Crawling detail pages one by one — use --urls-file for batch crawling instead of sequential fetch calls.

Authentication Workflow

When crawl output contains login prompts ("sign in", "log in", 403/401), follow these steps:

Create a profile — opens Chromium for manual login (requires GUI display; not available in SSH/headless environments):
```
crwl-cli profile create github
```
Log in to the site in the browser window, then press q in terminal to save.
Verify the profile works:
```
crwl-cli profile check github https://github.com/settings/profile
```
Check that the preview shows authenticated content.

Crawl with the profile:

crwl-cli fetch https://github.com/settings/profile --profile github

Auth Detection Heuristics

Re-crawl with a profile when the result contains:

Keywords: "sign in", "log in", "password", "authentication required"
HTTP status: 401, 403
Markdown is unexpectedly short (<100 chars) for a known content-rich page

Profile Management

crwl-cli profile list                              # List all profiles
crwl-cli profile create <name>                     # Create (opens browser)
crwl-cli profile check <name> <url>                # Test profile session
crwl-cli profile delete <name>                     # Delete profile

Profiles stored at: ~/.local/share/crwl-cli/profiles/<name>/

profile create opens a Chromium window and requires a GUI display.

Cache Management

Cache is off by default. Enable with --cache.

crwl-cli fetch https://example.com --cache         # Store result
crwl-cli cache list                                # List cached entries
crwl-cli cache clear                               # Clear all
crwl-cli cache clear --older-than 7                # Clear entries >7 days old

Cache stored at: ~/.local/share/crwl-cli/cache/

Output Formats

| Format | Flag | Content | Use Case | | ------ | ----------------------- | ----------------------------------------------------- | ------------------------------ | | md | --format md (default) | Filtered markdown (PruningContentFilter) | LLM consumption | | raw | --format raw | Full markdown, no filtering | Debugging, complete extraction | | json | --format json | {url, success, status_code, markdown, links, error} | Pipelines, batch processing |

Troubleshooting

| Problem | Solution | | ----------------------- | --------------------------------------------------- | | Empty markdown | Add --wait-for <selector> for JS-rendered content | | Timeout | Increase --timeout 60000 | | Too much noise | Use --css <selector> to scope extraction | | Images slow things down | Use --text-mode | | Auth wall | Create a profile: crwl-cli profile create <name> | | Stale session | Re-check: crwl-cli profile check <name> <url> |

mulatta/crwl-cli

skills/crwl-cli/SKILL.md

Crawl web pages and extract markdown. Handles auth via browser profiles.

tools

Updated Apr 29, 2026

$ install --global

skillsauth

npx skillsauth add mulatta/skillz crwl-cli

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 29, 2026, 10:31 AM16.3s1 file scanned

SKILL.md

name:: crwl-cli
description:: Crawl web pages and extract markdown. Handles auth via browser profiles.

Workflow Selection

Choose approach before crawling:

Basic Usage

# Single URL — markdown output (default)
crwl-cli fetch https://docs.python.org/3/library/asyncio.html

# CSS selector to limit scope
crwl-cli fetch https://docs.python.org/3/ --css "#content"

# JSON output (for pipelines)
crwl-cli fetch https://example.com --format json

# Raw markdown (no content filtering)
crwl-cli fetch https://example.com --format raw

# Fast mode — disable images
crwl-cli fetch https://example.com --text-mode

# Wait for dynamic content
crwl-cli fetch https://example.com --wait-for ".loaded"

# Batch crawl from file
crwl-cli fetch --urls-file urls.txt --format json

Multi-step Crawling

When to use: the target page links to multiple detail pages you need data from. Detect this when the page is a product listing, search results, category index, or configurator.

Steps

# 1. Crawl listing page → JSON (always use --format json for listings)
crwl-cli fetch https://shop.example.com/products --format json > listing.json

# 2. Extract detail page URLs via .links (NOT from .markdown)
jq -r '.links.internal[] | select(.href | test("/products/")) | .href' listing.json > urls.txt

# 3. Batch crawl all detail pages
crwl-cli fetch --urls-file urls.txt --format json

`links` structure (`--format json` only)

{
  "internal": [{ "href": "...", "text": "...", "title": "..." }],
  "external": [{ "href": "...", "text": "...", "title": "..." }]
}

Agent decision logic

Crawl the target URL with --format json
Check: does .links.internal contain multiple URLs matching a detail page pattern?
- Yes → filter with jq, write to file, batch crawl with --urls-file
- No → use .markdown directly
Extract needed information from each result's markdown

Anti-patterns

Reading .markdown to find URLs — unreliable, manual, and misses links hidden in JS-rendered elements. Always use .links.
Manually constructing detail page URLs — fragile if URL scheme changes. Let .links provide the canonical hrefs.
Crawling detail pages one by one — use --urls-file for batch crawling instead of sequential fetch calls.

Authentication Workflow

When crawl output contains login prompts ("sign in", "log in", 403/401), follow these steps:

Create a profile — opens Chromium for manual login (requires GUI display; not available in SSH/headless environments):
```
crwl-cli profile create github
```
Log in to the site in the browser window, then press q in terminal to save.
Verify the profile works:
```
crwl-cli profile check github https://github.com/settings/profile
```
Check that the preview shows authenticated content.

Crawl with the profile:

crwl-cli fetch https://github.com/settings/profile --profile github

Auth Detection Heuristics

Re-crawl with a profile when the result contains:

Keywords: "sign in", "log in", "password", "authentication required"
HTTP status: 401, 403
Markdown is unexpectedly short (<100 chars) for a known content-rich page

Profile Management

crwl-cli profile list                              # List all profiles
crwl-cli profile create <name>                     # Create (opens browser)
crwl-cli profile check <name> <url>                # Test profile session
crwl-cli profile delete <name>                     # Delete profile

Profiles stored at: ~/.local/share/crwl-cli/profiles/<name>/

profile create opens a Chromium window and requires a GUI display.

Cache Management

Cache is off by default. Enable with --cache.

crwl-cli fetch https://example.com --cache         # Store result
crwl-cli cache list                                # List cached entries
crwl-cli cache clear                               # Clear all
crwl-cli cache clear --older-than 7                # Clear entries >7 days old

Cache stored at: ~/.local/share/crwl-cli/cache/

Output Formats

Troubleshooting

Related Skills

mulatta/biorefs-cli

tools

VerifiedTrustedCommunity

Biomedical literature, reference, and entity research helper. Use whenever the user asks for PubMed/PMC/NCBI/Entrez paper search, PMID/PMCID/DOI conversion, biomedical citation/BibTeX/RIS export, legal OA full-text lookup, gene/protein/RNA/transcript evidence, OpenAlex citation/OA enrichment, Semantic Scholar enrichment, PubChem compound/assay/bioactivity lookup, or bio/medical literature review evidence collection.

SKILL.mdUpdated May 22, 2026

mulatta/kmap-cli

tools

VerifiedTrustedCommunity

Use kmap-cli whenever the user asks for Korea-focused 장소찾기/POI lookup, 주변검색, 맛집 후보 찾기, 대중교통 길찾기, 경유지 transit routing, address geocoding, reverse geocoding, saved home/work aliases, or NAVER/Kakao/TMAP map app handoff. Default to TMAP API for machine-readable place/transit data; use NAVER/Kakao only as URL handoff helpers without NAVER/Kakao API keys. Do not use ODsay.

SKILL.mdUpdated May 19, 2026

mulatta/linkwarden-cli

tools

VerifiedTrustedCommunity

Manage Linkwarden bookmarks, collections, tags, highlights, RSS subscriptions, archives, and API tokens through a restricted CLI. Use when the user asks to save, search, organize, archive, or delete Linkwarden links.

SKILL.mdUpdated May 18, 2026

mulatta/linkwarden-cli

mulatta/vikunja-cli

tools

VerifiedTrustedCommunity

Manage Vikunja projects, tasks, relations, templates, attachments, labels, comments, due/reminder notifications, views, and kanban buckets through a restricted CLI. Use whenever the user asks to inspect or update Vikunja tasks/projects, create structured tasks from sources, attach evidence, link blockers/subtasks/order with task relations, move tasks between projects or kanban buckets, manage workflow labels/comments, or check Vikunja reminders/overdue items. Prefer this skill over raw Vikunja API calls.

SKILL.mdUpdated May 13, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/mulatta/skillz.git

# Copy into Claude Code skills folder (global)
cp -r skillz/skills/crwl-cli ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

mulatta/skillz

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

mulatta/crwl-cli

$ install --global

Security Scan Results

SKILL.md

Workflow Selection

Basic Usage

Multi-step Crawling

Steps

links structure (--format json only)

Agent decision logic

Anti-patterns

Authentication Workflow

Auth Detection Heuristics

Profile Management

Cache Management

Output Formats

Troubleshooting

Related Skills

mulatta/biorefs-cli

mulatta/kmap-cli

mulatta/linkwarden-cli

mulatta/vikunja-cli

mulatta/crwl-cli

$ install --global

Security Scan Results

SKILL.md

Workflow Selection

Basic Usage

Multi-step Crawling

Steps

links structure (--format json only)

Agent decision logic

Anti-patterns

Authentication Workflow

Auth Detection Heuristics

Profile Management

Cache Management

Output Formats

Troubleshooting

Related Skills

mulatta/biorefs-cli

mulatta/kmap-cli

mulatta/linkwarden-cli

mulatta/vikunja-cli

`links` structure (`--format json` only)

`links` structure (`--format json` only)