skills/ghcrawl/SKILL.md
Use the local ghcrawl CLI to inspect duplicate clusters and issue/PR summaries from the existing ghcrawl dataset, and refresh one repo only when the user explicitly asks. Use when a user wants to triage related issues or PRs, inspect semantic clusters, or run ghcrawl's staged refresh pipeline.
npx skillsauth add pwrdrvr/ghcrawl ghcrawlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use ghcrawl as the machine-facing interface for local GitHub duplicate-cluster analysis.
Never read the ghcrawl SQLite database directly with sqlite3 or any other database client. If the supported CLI cannot return the needed information, report that CLI problem to the user instead of bypassing the interface.
Do not scrape the TUI. Prefer JSON CLI output.
The skill has two modes:
ghcrawl doctor --json proves GitHub and OpenAI auth are configured and healthy.In default mode, do not treat missing credentials as a problem unless the user explicitly asked for an API-backed operation or a supported read-only CLI command failed and doctor shows local setup is broken.
Even in API-enabled mode, never run sync, embed, cluster, or refresh unless the user explicitly asks for that work. Those commands can take a long time, consume paid API usage, and trigger rate limiting if used too often.
Current pipeline defaults to keep in mind:
vectorlite sidecar indexgpt-5-minititle_original, so refresh does not summarize unless the user explicitly switches to title_summaryghcrawl configure makes the next refresh rebuild vectors and clusterstitle_summary can materially improve clustering quality, but it adds OpenAI cost; on openclaw/openclaw it improved non-solo cluster membership by about 50%Also never run close-thread or close-cluster unless the user explicitly asks you to mark a local thread or cluster closed.
Prefer the installed ghcrawl bin.
If ghcrawl is not on PATH, use:
npx ghcrawl cli ...
Do not start by running ghcrawl --help or <subcommand> --help. The documented command surface in this skill and references/protocol.md is the default source of truth. If you are actively maintaining ghcrawl itself or syntax is genuinely in doubt, command help is available through ghcrawl help <command> and ghcrawl <command> --help.
Do not run doctor on skill startup by default.
Start with local read-only commands:
Without explicit user direction to refresh data, prefer these local-only commands:
ghcrawl threads owner/repo --numbers 12345 --json
ghcrawl clusters owner/repo --min-size 10 --limit 20 --sort recent --json
ghcrawl cluster-detail owner/repo --id 123 --member-limit 20 --body-chars 280 --json
ghcrawl threads owner/repo --numbers 42,43,44 --json
ghcrawl author owner/repo --login lqquan --json
ghcrawl search owner/repo --query "download stalls" --mode hybrid --json
ghcrawl neighbors owner/repo --number 42 --limit 10 --json
ghcrawl configure --json
These operate on the existing local SQLite dataset.
Treat that stored dataset as the default source of truth for read-only analysis. Do not probe credentials, inspect env vars, or explain missing auth unless an API-backed task was requested or the supported CLI path is failing.
By default:
threads and author hide locally closed issues/PRsclusters and cluster-detail hide locally closed clustersIf the user explicitly wants to inspect those records, add --include-closed.
Use threads --numbers 12345 when you need to find the cluster for one specific issue/PR number. The returned thread record includes clusterId. If it is non-null, follow with cluster-detail --id <clusterId>.
Use configure --json when you need to confirm the currently selected summary model or embedding basis before suggesting an expensive refresh.
Use threads --numbers ... when you need a batch of specific issue/PR records. Do not pay the CLI startup cost 10 times for 10 separate single-thread lookups.
Use author --login ... when you need one author's open threads and their strongest stored same-author similarity matches in one call.
If the user explicitly asks to mark a local issue/PR or cluster closed, use:
ghcrawl close-thread owner/repo --number 42 --json
ghcrawl close-cluster owner/repo --id 123 --json
If close-thread closes the last open item in a cluster, ghcrawl will automatically mark that cluster closed too.
Run:
ghcrawl doctor --json
If the bin is unavailable, fall back to:
pnpm --filter ghcrawl cli doctor --json
Only do this when:
refresh, sync, embed, or clusterInterpret the result like this:
If doctor is unhealthy but the user asked only for read-only inspection, say that API-backed refresh is unavailable and continue with read-only CLI commands when possible.
Use one supported fallback path before giving up:
pnpm --filter ghcrawl cli ...
If a documented ghcrawl command still fails, hangs, or returns unusable output through the supported CLI path, stop and report that to the user. Do not inspect tables, schema, or rows with sqlite3, pragma, or ad hoc SQL.
Only if the user explicitly asks to refresh or rebuild data, and doctor says auth is healthy, use:
ghcrawl refresh owner/repo
This runs, in fixed order:
You may skip steps only when the user explicitly wants that or the freshness state makes it unnecessary:
ghcrawl refresh owner/repo --no-sync
ghcrawl refresh owner/repo --no-cluster
Do not decide on your own to run cluster just because it is local-only. It is still long-running and should be treated as an explicit user-directed operation.
Use:
ghcrawl clusters owner/repo --min-size 10 --limit 20 --sort recent --json
This returns:
Use:
ghcrawl cluster-detail owner/repo --id 123 --member-limit 20 --body-chars 280 --json
This returns:
Use search or neighbors as needed:
ghcrawl search owner/repo --query "download stalls" --mode hybrid --json
ghcrawl neighbors owner/repo --number 42 --limit 10 --json
Cluster <clusterId> (#<representativeNumber> representative <issue|pr>)Cluster 23945 (#42035 representative issue)For the exact JSON-oriented command surface and examples, read:
documentation
Plan and publish a GitHub Release in a tag-driven repository. Use when a user asks to cut, prepare, or publish a software release, propose the next vX.Y.Z tag, support prerelease tags like vX.Y.Z-beta.N, draft better release notes from PRs and direct commits since the last release, update CHANGELOG.md, create the tag pinned to an exact commit, and watch the publish workflow.
testing
Manage GitHub issues and the GitHub Project board for the current repository, while keeping the local tracker in sync. Use when the user wants to capture freeform requirements as issues, flesh out issue descriptions from repo or upstream research, triage Priority/Size/Workflow/Status, add issues or PRs to the repo's configured project board, or reconcile GitHub state with `.local/work-items.yaml`.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.
tools
# Lobster Lobster executes multi-step workflows with approval checkpoints. Use it when: - User wants a repeatable automation (triage, monitor, sync) - Actions need human approval before executing (send, post, delete) - Multiple tool calls should run as one deterministic operation ## When to use Lobster | User intent | Use Lobster? | | ------------------------------------------------------ | --------------------------