web-ai/SKILL.md
Structured browser web-ai workflow for ChatGPT, Gemini, and Grok in cli-jaw.
npx skillsauth add lidge-jun/cli-jaw-skills web-aiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the task is to ask an AI website through browser control instead of calling a model API directly.
agbrowse or standalone agbrowse, run
agbrowse --help first, and run agbrowse web-ai --help before choosing
web-ai flags. Treat the current help output as command truth and adapt this
skill to that surface instead of assuming cli-jaw wrapper parity.--inline-only only when the user explicitly wants pasted inline context.
Source context should normally be packaged with --context-from-files /
--context-file; upload transport creates one .zip archive attachment
containing CONTEXT_PACKAGE.md plus the selected source files.--file unless explicitly requested. For source
context, use the context packaging flags first.evaluate through web-ai.vision-click only as an explicit fallback when DOM/snapshot cannot see a
visible UI target.| Surface | Label | Notes |
| --- | --- | --- |
| prompt render/context dry-run | ready | browser-free and deterministic |
| ChatGPT/Gemini/Grok live send/poll/query | beta | depends on provider UI/account state |
| ChatGPT semantic resolver and answer artifacts | ready in cli-jaw mirror | mirrors agbrowse Phase 16/17 contracts |
| source audit flags | ready in cli-jaw mirror | --require-source-audit fails closed on missing inline sources |
| hosted/cloud/external-CDP operation | deferred | do not claim hosted browser support |
Build a structured question envelope:
[SYSTEM]
...
[USER]
## Project
...
## Goal
...
## Context
...
## Question
...
## Output
...
## Constraints
...
cli-jaw browser web-ai render --vendor chatgpt --prompt "..."
cli-jaw browser web-ai context-dry-run --vendor chatgpt --prompt "..." --context-from-files "src/**/*.ts" --files-report
cli-jaw browser web-ai context-render --vendor chatgpt --prompt "..." --context-from-files "src/**/*.ts"
cli-jaw browser web-ai code --vendor chatgpt --model thinking --effort heavy --prompt "Build a Flask API MVP" --output-zip ./result.zip
cli-jaw browser web-ai code-extract --vendor chatgpt --conversation "https://chatgpt.com/c/<conversation-id>" --output-zip ./result.zip
cli-jaw browser web-ai status --vendor chatgpt
cli-jaw browser web-ai query --vendor chatgpt --prompt "..." --context-from-files "src/foo.ts"
cli-jaw browser web-ai query --vendor chatgpt --inline-only --allow-copy-markdown-fallback --prompt "..."
cli-jaw browser web-ai query --vendor grok --inline-only --require-source-audit --source-audit-scope "sources checked" --source-audit-date "2026-05-05" --prompt "..."
cli-jaw browser web-ai poll --vendor chatgpt --timeout 1200
cli-jaw browser web-ai capabilities --vendor chatgpt
cli-jaw browser web-ai notifications --vendor chatgpt
cli-jaw browser web-ai stop --vendor chatgpt
Use only when explicitly needed:
cli-jaw browser web-ai query \
--vendor chatgpt \
--inline-only \
--allow-copy-markdown-fallback \
--prompt "Return a markdown table."
The runtime intercepts the page's navigator.clipboard.writeText/write during
the provider Copy button click. It does not read the OS clipboard. The flag is
the explicit policy opt-in for CLI use; do not add --unsafe-allow.
web-ai poll, web-ai query, and web-ai watch accept --timeout <seconds>.
When omitted, the runtime uses these defaults so heavy reasoning models
(ChatGPT Pro/Heavy, Gemini Deep Think) have room to finish:
| Vendor | Default --timeout | Roughly |
| --- | ---: | --- |
| ChatGPT | 1200 | 20 minutes |
| Gemini | 1200 | 20 minutes |
| Grok | 600 | 10 minutes |
Pass --timeout 1800 (30 min) or higher for unusually long Pro/Deep Think
runs. The provider tab and the cli-jaw browser Chrome process stay open
across a poll timeout — only the polling loop gives up.
For responses that may take many minutes (ChatGPT Pro/Heavy, Gemini Deep
Think, Deep Research), do NOT block the boss turn on web-ai query. Register
a server-owned background task instead:
SID=$(cli-jaw browser web-ai send --vendor chatgpt --model pro --inline-only \
--prompt "..." --json | jq -r .sessionId)
cli-jaw bgtask add --preset web-ai --session "$SID" \
--prompt "web-ai result: {{result}} — summarize and deliver to the user"
# → end the turn. The jaw server owns the work (native session probe,
# restart-durable) and re-invokes the boss with a [bgtask:*] prompt
# when the session reaches complete/timeout/error.
cli-jaw bgtask list / cli-jaw bgtask show <taskId>.query for short/fast lookups where waiting is cheaper.Completed provider tabs are kept warm in a per-vendor pool so the next
send reuses a tab instead of creating a new one. Defaults (overridable
via env on the agbrowse side):
| Setting | Default | Env Var |
| --- | --- | --- |
| TTL per pooled tab | 15 min | AGBROWSE_PROVIDER_POOL_TTL |
| Warm tabs per (owner,vendor,sessionType,origin,profile) | 3 | AGBROWSE_PROVIDER_POOL_MAX_PER_KEY |
| Global cap on warm provider tabs | 8 | AGBROWSE_PROVIDER_POOL_GLOBAL_MAX |
If you hit Target page... has been closed while issuing a second
Pro / Deep Think query while another is still polling, that is lease
contention on the per-key cap. Pass --new-tab (or its alias
--parallel) on the second call to bypass pool reuse and allocate a
fresh provider tab.
cli-jaw browser web-ai status --vendor <v> --json now embeds a
capabilities[] array sourced from src/browser/web-ai/capability-registry.ts.
Each row carries { providerId, capabilityId, family, status, frontendStatus, mutationAllowed, activationPath, activeStateSignals, failureStage }.
Scope to a single capability with --probe <capabilityId>.
cli-jaw browser web-ai capabilities continues to expose the registry
directly (with --family / --frontend-status filters). agbrowse mirrors
the same hyphenated capability ID convention via its much smaller probe
runtime in web-ai/capability.mjs.
Completed poll, query, and watch results may include:
answerArtifact: normalized capture metadata (capturedBy,
exactnessScore, text/markdown lengths, warnings).sourceAudit: inline source coverage report when
--require-source-audit is enabled.Use --require-source-audit for research tasks where bottom-only provider
source drawers are not enough. Pair absence/no-official-response claims with
--source-audit-scope and --source-audit-date.
Failures from cli-jaw browser web-ai * carry a typed JSON envelope with
errorCode, stage, retryHint, vendor, mutationAllowed,
selectorsTried, and optional evidence. HTTP responses
(/api/browser/web-ai/* 5xx bodies) and CLI --json output share the
same shape via WebAiError.toJSON(). Initial code list (full catalog in
agbrowse devlog/03_phase2_errors.md):
cdp.unreachable, cdp.target-mismatchprovider.composer-not-visible, provider.model-mismatch,
provider.attachment-preflight, provider.attachment-evidence-missing,
provider.commit-not-verified, provider.poll-timeout,
provider.runtime-disabledcapability.unsupportedcontext.over-budget, context.symlink-rejectedgrok.context-pack-not-allowedinternal.unhandledExisting cli-jaw error classes map to typed codes via
fromCliJawStructuredError:
WrongTargetError → cdp.target-mismatch (preserves
expectedTargetId / actualTargetId in evidence).BrowserCapabilityError → capability.unsupported (preserves
capabilityId / ownerPrd).ProviderRuntimeDisabledError → provider.runtime-disabled (preserves
vendor / stage).cli-jaw browser web-ai send/query --vendor grok with --context-from-files
/ --context-file / --context-transport upload throws with
stage: 'grok-context-pack-not-allowed'. Pass --allow-grok-context-pack
to override deliberately. When the override is used, the runtime emits a
grok-context-pack-not-recommended warning. Grok prefers inline prompts
plus an optional single --file upload; ChatGPT or Gemini handle context
packages more reliably.
When the user asks to drive a single Chrome instance (for example to
keep their own logged-in profile open and not run two CDP sessions), the
same web-ai workflow is available through the standalone agbrowse CLI
(npm install -g agbrowse). The flags and prompt envelope shape are
identical; only the binary prefix changes.
| cli-jaw browser form | agbrowse form |
| --- | --- |
| cli-jaw browser start | agbrowse start |
| cli-jaw browser status | agbrowse status |
| cli-jaw browser snapshot --interactive | agbrowse snapshot --interactive |
| cli-jaw browser web-ai render ... | agbrowse web-ai render ... |
| cli-jaw browser web-ai query --vendor chatgpt ... | agbrowse web-ai query --vendor chatgpt ... |
| cli-jaw browser web-ai poll --vendor chatgpt --timeout 1200 | agbrowse web-ai poll --vendor chatgpt --timeout 1200 |
| cli-jaw browser web-ai code --vendor chatgpt ... | agbrowse web-ai code --vendor chatgpt ... |
| cli-jaw browser web-ai code-extract --vendor chatgpt ... | agbrowse web-ai code-extract --vendor chatgpt ... |
Only switch when the user explicitly asks for the standalone path. The
two runtimes share defaults (ChatGPT/Gemini 1200s, Grok 600s) and the
same [INSTRUCTIONS] prompt block, so behavior stays consistent. Do not
run both against the same --port at the same time.
When standalone agbrowse is explicitly requested, first inspect:
agbrowse --help
agbrowse web-ai --help
Then select flags from the observed help text. The standalone binary can move faster than the cli-jaw wrapper, so do not invent wrapper-only flags or assume older aliases when the current help output differs.
cli-jaw browser web-ai code is the native cli-jaw mirror of agbrowse code
mode. It is ChatGPT-only. The runtime automatically uploads
gpt-dev-agent-context.zip as attachment 1 before any user-provided --file
attachments, then sends a strict code-generation prompt. New artifacts must
contain PLAN.md or 00_plan.md; the retrieval step fails closed when that
plan file is absent.
The context zip is attached only on the FIRST turn of a conversation.
Continuation turns (--url, --conversation, or --session targeting an
existing conversation) skip it: the container /mnt persists across turns and
the contract already lives in the conversation history. Pass --context-refresh
to force a re-upload (e.g. after the context module changed, or when a long-idle
conversation may have recycled its sandbox).
The prompt asks ChatGPT to use a visible plan/todo tool only when the tool is
actually available. If no such tool exists in the ChatGPT environment, the
generated project must put the checklist and verification record in
PLAN.md or 00_plan.md instead.
Generate and recover a single artifact:
cli-jaw browser web-ai code \
--vendor chatgpt \
--model thinking \
--effort heavy \
--prompt "Build a Flask API MVP" \
--output-zip ./result.zip
Generate multiple named artifacts:
cli-jaw browser web-ai code \
--vendor chatgpt \
--model pro \
--effort extended \
--prompt "Build backend.zip and frontend.zip" \
--multi-zip \
--output-dir ./artifacts
When an old ChatGPT conversation still contains assistant text such as
MACHINE: /mnt/data/result.zip or /mnt/data/result.zip, recover it later
without sending a new prompt:
cli-jaw browser web-ai code-extract \
--vendor chatgpt \
--conversation "https://chatgpt.com/c/<conversation-id>" \
--output-zip ./result.zip
Then verify locally:
unzip -t ./result.zip
unzip -l ./result.zip
The original conversation URL/session/current ChatGPT tab and logged-in
browser profile are still required; a copied /mnt/data/result.zip line alone
is not enough.
Stale-snapshot guard: when one conversation rebuilds the same sandbox path
(e.g. /mnt/data/result.zip) across several code runs, the download API serves
the snapshot tied to the message id used to mint the URL. The extractor mints
candidate message ids NEWEST-first (agbrowse 02f03cc), so the first successful
mint is the latest sandbox state, and the result reports mintedMessageId for
auditing. Even so, ALWAYS verify retrieved zip contents against drop-specific
symbols (grep a file or identifier unique to the expected delivery) before
applying — on mismatch, retry with --multi-zip to recover every archive and
identify the right one.
For new agbrowse web-ai code runs, the prompt contract asks ChatGPT to create
PLAN.md or 00_plan.md in every generated code zip, and to use a visible
todo/checklist tool such as turn_plan.update_turn_plan only when that tool is
actually available while the response is streaming. Keep visible/top-level
checklists to 8 items or fewer; for complex work, put extra detailed stage
instructions in the plan markdown instead of creating more visible todo items.
That visible todo UI may disappear after the answer finishes; do not fail a
completed run because the UI is no longer visible. The durable validation
target is the zip-root PLAN.md or 00_plan.md checklist. Completed items in
that plan file should be marked [x] before final packaging.
Use this when the user asks for max context / current context packaging before browser submission.
Rules:
--file still means live browser upload. Do not use it for source context.--context-from-files may be repeated and accepts files, directories, and globs.--context-exclude may be repeated and accepts glob excludes.--context-file accepts a newline or JSON list of include/exclude patterns..zip archive context
package and attach it in the ChatGPT/Gemini composer. Do not create a
temporary .txt/.md file yourself for source context.--inline-only or --context-transport inline forces the old pasted
composer path.--max-input sets the model input-token preflight budget.--max-file-size defaults to 1 MB per file.context-dry-run --json omits composerText unless --full is passed.context-render prints the CONTEXT_PACKAGE.md body that will be placed
inside the .zip archive by live upload transport.send/query with context packaging must fail before browser mutation if token
budget is exceeded, or if inline transport exceeds the inline character budget.Example:
cli-jaw browser web-ai context-dry-run \
--vendor chatgpt \
--model pro \
--prompt "review current context" \
--context-from-files "src/browser/web-ai/**/*.ts" \
--context-exclude "**/*.test.ts" \
--files-report \
--json
Live web-ai execution policy:
headed Chrome required
headless forbidden
Codex Cloud out of scope
observed frontend capability -> schema row -> verified mutation
not observed -> fail closed
Use the 30_browser-derived loop:
active-tab -> snapshot -> act -> snapshot -> verify
Refs are latest-snapshot scoped. Re-run snapshot after navigation, reload, or any action that can replace the DOM before using an existing ref.
Before sending a prompt, verify the active tab is ChatGPT. If active tab is not verified, stop and ask the operator to run:
cli-jaw browser tabs --json
cli-jaw browser tab-switch <target>
web-ai send captures a baseline before insertion:
Raw prompt text must not be persisted. Polling only accepts answers that appear after the saved baseline.
Current:
button.__composer-pill[aria-haspopup="menu"] labeled Instant/Thinking/Pro
or a plain Heavy pill, while the older top model-switcher-dropdown-button
can be absent. Treat visible Heavy as active ChatGPT Pro/Heavy. For direct
DOM fallback, open the model pill and select
[data-testid="model-switcher-gpt-5-5-pro-thinking-effort"]; do not click
generic "Pro" by role/name because the profile menu can also match.Intelligence
menu instead of the older model row plus effort submenu. The runtime maps
instant and thinking --effort light to Instant,
thinking --effort standard to Medium, thinking --effort extended to
High, thinking --effort heavy to Extra High,
pro --effort standard to Pro Extended, and pro --effort extended to
Pro Extended.3.1 Flash-Lite, 3 Flash, and 3.1 Pro, but
workflow commands must use stable aliases (flash-lite, flash, pro).
The runtime normalizes future 3.n labels generically and keeps legacy
fast as flash-lite, while thinking maps to pro. Deep Think remains
a separate tool/mode request, not the plain --model alias.cli-jaw browser web-ai code) with automatic
gpt-dev-agent-context.zip attachment, PLAN.md/00_plan.md
enforcement for new artifacts, later code-extract, multi-zip retrieval,
and repeatable mixed --file uploadsFuture:
development
Native Web UI structured renderer schemas for compose-block drafts, search-results cards, dataframe tables, chart-json charts, and diff output
tools
Unified search hub. Route any web/real-time/X lookup through a 4-tier escalation: built-in web search → cli-jaw browser CDP → progrok Grok OAuth → web-ai (Grok Expert / GPT Pro). Use for: search, 검색, web search, latest news, real-time info, X/Twitter, fact lookup, deep research.
development
UI/UX intent discovery, design vocabulary, product personalities, UX state patterns, typography line break judgment, favicon/product logo design, and logo trust section design. Use when user design direction is vague, when building onboarding/empty/error states, when setting up favicons or product logos, or when referencing a product aesthetic.
development
Canonical owner of module boundary rules, circular dependency detection/prevention, implicit coupling taxonomy, barrel/re-export discipline, and boundary-only defensive programming. Referenced by dev, dev-code-reviewer, dev-backend, dev-frontend stubs.