skills/local/agent-browser/SKILL.md
Browser automation for AI agents via the native agent_browser tool. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
npx skillsauth add renatocaliari/agent-sync-public-skills agent-browserInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill uses the native agent_browser tool (from pi-agent-browser-native), not bash.
All browser operations use structured JSON arguments instead of shell commands.
The upstream CLI agent-browser (Rust) is still the engine. Templates in templates/ are shell scripts for CI/terminal use. For LLM-driven automation, always use the native tool.
The agent_browser tool accepts exactly one of these mutually exclusive modes per call:
| Mode | When to use | Example |
|------|-------------|---------|
| args | Direct CLI args — any upstream command | { "args": ["open", "https://example.com"] } |
| semanticAction | Find shorthand — click/fill/select by locator | { "semanticAction": { "action": "click", "locator": "text", "value": "Submit" } } |
| job | Constrained multi-step flow (schema-safe) | { "job": { "steps": [...] } } |
| qa | Lightweight smoke test preset | { "qa": { "url": "...", "expectedText": "..." } } |
| batch | Arbitrary multi-step flow (raw JSON stdin) | { "args": ["batch"], "stdin": "..." } |
All modes support sessionMode: "auto" (default, reuses implicit session) or "fresh" (rotates to new session).
1. Navigate → agent_browser({args: ["open", <url>]})
2. Snapshot → agent_browser({args: ["snapshot", "-i"]}) (get @e1, @e2 refs)
3. Interact → agent_browser({args: ["click", "@e1"]}) or semanticAction
4. Re-snapshot → after navigation/DOM changes
// Open and inspect
{ "args": ["open", "https://example.com/form"] }
{ "args": ["snapshot", "-i"] }
// Interact
{ "args": ["fill", "@e1", "[email protected]"] }
{ "args": ["fill", "@e2", "password123"] }
{ "args": ["click", "@e3"] }
{ "args": ["wait", "--load", "networkidle"] }
{ "args": ["snapshot", "-i"] }
stdin instead of a JSON stringThe stdin parameter must always be a string containing JSON — never a raw array or object. This is the most common error when using batch.
Wrong (will fail with Validation failed):
{ "args": ["batch"], "stdin": [["open", "https://example.com"], ["snapshot", "-i"]] }
Right (string-escaped JSON):
{ "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"]]" }
Safest — use JSON.stringify() (when composing tool calls programmatically):
const steps = [["open", "https://example.com"], ["snapshot", "-i"]];
const toolCall = { args: ["batch"], stdin: JSON.stringify(steps) };
This also applies to eval --stdin: the stdin value must be a string.
Refs (@e1, @e2, etc.) point to elements on the current page. After any click, form submit, or scroll that changes the DOM, all refs are invalidated. See the Ref Lifecycle section for details.
Wrong:
{ "args": ["click", "@e3"] }
{ "args": ["click", "@e1"] } // @e1 is stale — page changed
Right:
{ "args": ["click", "@e3"] }
{ "args": ["snapshot", "-i"] }
{ "args": ["click", "@e1"] } // @e1 from fresh snapshot
data-show for DaisyUI modalsDaisyUI modals use the modal-open class — not the show attribute. Use data-class with Datastar signals instead of data-show or JavaScript for modal visibility.
Wrong:
<dialog class="modal" data-show="$openModal"> <!-- will not work -->
Right:
<dialog class="modal" data-class="{'modal-open': $openModal}">
Use batch, job, or qa to run multiple steps in a single tool call.
This is more efficient than separate calls.
{ "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"wait\",\"--load\",\"networkidle\"],[\"snapshot\",\"-i\"]]" }
Supported steps: open, click, fill, wait, assertText, assertUrl, waitForDownload, screenshot
{
"job": {
"steps": [
{ "action": "open", "url": "https://example.com/form" },
{ "action": "fill", "selector": "@e1", "text": "[email protected]" },
{ "action": "fill", "selector": "@e2", "text": "password123" },
{ "action": "click", "selector": "@e3" },
{ "action": "wait", "milliseconds": 1000 },
{ "action": "screenshot", "path": "result.png" }
]
}
}
{ "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": "qa.png" } }
args){ "args": ["open", "<url>"] }
{ "args": ["back"] }
{ "args": ["forward"] }
{ "args": ["reload"] }
{ "args": ["snapshot", "-i"] } // Interactive elements only (recommended)
{ "args": ["snapshot"] } // Full accessibility tree
{ "args": ["snapshot", "-i", "--urls"] } // Interactive + link hrefs
{ "args": ["snapshot", "-s", "#selector"] } // Scoped to CSS selector
{ "args": ["click", "@e1"] }
{ "args": ["fill", "@e2", "value"] }
{ "args": ["type", "@e2", "value"] } // Type without clearing
{ "args": ["select", "@e3", "option-value"] }
{ "args": ["check", "@e4"] }
{ "args": ["press", "Enter"] }
{ "args": ["hover", "@e5"] }
{ "args": ["scroll", "down", "500"] }
{ "args": ["scrollintoview", "@e1"] }
semanticAction or find)// Using semanticAction shorthand
{ "semanticAction": { "action": "click", "locator": "text", "value": "Sign In" } }
{ "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "[email protected]" } }
{ "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Submit" } }
{ "semanticAction": { "action": "select", "locator": "label", "value": "Country", "text": "Brazil" } }
// Using find via args (equivalent)
{ "args": ["find", "text", "Sign In", "click"] }
{ "args": ["find", "role", "button", "click", "--name", "Close"] }
{ "args": ["wait", "--load", "networkidle"] }
{ "args": ["wait", "@e1"] } // Wait for element
{ "args": ["wait", "--url", "**/dashboard"] } // Wait for URL
{ "args": ["wait", "--text", "Welcome"] } // Wait for text
{ "args": ["wait", "2000"] } // Fixed ms
{ "args": ["wait", "--download", "./report.pdf"] } // Wait for download
{ "args": ["screenshot"] } // Temp path
{ "args": ["screenshot", "--full", "page.png"] } // Full page
{ "args": ["screenshot", "--annotate", "annotated.png"] } // Annotated with refs
{ "args": ["pdf", "page.pdf"] }
{ "args": ["get", "title"] }
{ "args": ["get", "url"] }
{ "args": ["get", "text", "@e1"] }
{ "args": ["eval", "--stdin"], "stdin": "document.title" }
{ "args": ["eval", "--stdin"], "stdin": "JSON.stringify(Array.from(document.querySelectorAll('a')).map(a => a.href))" }
{ "args": ["diff", "snapshot"] } // Compare vs last snapshot
{ "args": ["diff", "screenshot", "--baseline", "before.png"] } // Visual diff
{ "args": ["diff", "url", "<url1>", "<url2>"] } // Compare pages
{ "args": ["keyboard", "type", "text with keystrokes"] } // At current focus
{ "args": ["keyboard", "inserttext", "text without events"] } // Insert without key events
{ "args": ["download", "@e1", "./file.pdf"] } // Click & download
{ "args": ["wait", "--download", "./output.zip"] } // Wait for pending download
{ "args": ["batch"], "stdin": "[[\"click\",\"@export\"],[\"wait\",\"--download\",\"report.csv\"]]" }
{ "args": ["tab", "list"] }
{ "args": ["tab", "t2"] } // Switch by id
{ "args": ["tab", "new", "https://example.com"] }
{ "args": ["tab", "close"] }
{ "args": ["auth", "save", "my-login", "--password-stdin"], "stdin": "<password>" }
{ "args": ["auth", "login", "my-login"] }
{ "args": ["state", "save", "./auth.json"] }
{ "args": ["state", "load", "./auth.json"], "sessionMode": "fresh" }
{ "args": ["session"] }
{ "args": ["session", "list"] }
// Requires --enable react-devtools at launch
{ "args": ["--enable", "react-devtools", "open", "https://example.com"], "sessionMode": "fresh" }
{ "args": ["react", "tree"] }
{ "args": ["react", "inspect", "<fiberId>"] }
{ "args": ["vitals", "https://example.com"] }
{ "args": ["pushstate", "/dashboard"] } // SPA navigation
{ "args": ["console"] }
{ "args": ["errors"] }
{ "args": ["console", "--clear"] }
{ "args": ["errors", "--clear"] }
{ "args": ["record", "start", "./issue-001-repro.webm"] }
{ "args": ["record", "stop"] }
{ "args": ["trace", "start"] }
{ "args": ["trace", "stop", "./trace.json"] }
The extension manages sessions automatically:
sessionMode: "auto" — reuses the implicit session across callssessionMode: "fresh" — rotates to a new session (for profile/auth changes)--session <name> in args for named sessions// Normal browsing — auto session
{ "args": ["open", "https://example.com"] }
// Fresh session (e.g., after login)
{ "args": ["--profile", "Default", "open", "https://mail.google.com"], "sessionMode": "fresh" }
// Named session for parallel browsing
{ "args": ["--session", "site1", "open", "https://site-a.com"] }
{ "args": ["--session", "site2", "open", "https://site-b.com"] }
Cleanup is automatic (on pi quit or idle timeout). Explicit close:
{ "args": ["close"] }
{ "args": ["close", "--all"] }
Refs (@e1, @e2, etc.) are invalidated when the page changes. Always re-snapshot after:
{ "args": ["click", "@e5"] } // Navigated
{ "args": ["snapshot", "-i"] } // MUST re-snapshot
{ "args": ["click", "@e1"] } // Use new refs
The tool provides automatic recovery guidance via details.nextActions when stale refs are detected.
Use --annotate for pages with unlabeled icon buttons, canvas/charts, or when you need spatial reasoning. Each label [N] maps to ref @eN.
{ "args": ["screenshot", "--annotate", "page.png"] }
// Output includes legend: [1] @e1 button "Submit", [2] @e2 link "Home"
{ "args": ["click", "@e1"] }
Create agent-browser.json in the project root for persistent settings:
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data"
}
Use env vars for quick overrides: AGENT_BROWSER_COLOR_SCHEME=dark, AGENT_BROWSER_DEFAULT_TIMEOUT=45000.
{ "args": ["--auto-connect", "open", "https://example.com"], "sessionMode": "fresh" }
{ "args": ["--cdp", "9222", "snapshot"], "sessionMode": "fresh" }
| Reference | When to Use | |-----------|-------------| | references/commands.md | Full command reference with all options | | references/snapshot-refs.md | Ref lifecycle, invalidation rules | | references/session-management.md | Parallel sessions, scraping | | references/authentication.md | Login flows, OAuth, 2FA, state reuse | | references/video-recording.md | Recording workflows | | references/profiling.md | DevTools profiling | | references/proxy-support.md | Proxy config, geo-testing |
These are shell scripts for direct use outside pi:
| Template | Description | |----------|-------------| | templates/form-automation.sh | Form filling | | templates/authenticated-session.sh | Login workflow | | templates/capture-workflow.sh | Content extraction |
# Run directly in terminal (not through agent_browser)
./templates/form-automation.sh https://example.com/form
For complete upstream CLI reference, see the agent-browser docs.
The pi-agent-browser-native extension docs (TOOL_CONTRACT.md, COMMAND_REFERENCE.md) are in the package at /opt/homebrew/lib/node_modules/pi-agent-browser-native/docs/.
tools
Auto-initialize structured documentation for any project using lat.md (knowledge graph of markdown files with [[wiki links]], // @lat: code refs, and semantic search). Detects cali-product-workflow artifacts (spec-product.md, spec-tech.md, critiques) and uses them as seed material. Falls back to extracting business rules, architecture, and design decisions directly from the codebase. Use when a project lacks structured documentation or when lat.md/ is missing. After seeding, lat.md extension hooks keep documentation alive automatically.
testing
[Cali] Server security audit and hardening for private servers behind Tailscale. Use when: auditing server security, hardening SSH/firewall/Docker, checking for vulnerabilities, setting up fail2ban, reviewing port exposure, or responding to security alerts. Covers 6 layers: CloudFlare, UFW, Tailscale, SSH, Docker, Application. Triggers: "server security", "security audit", "harden server", "SSH hardening", "firewall rules", "UFW config", "fail2ban", "port security", "Docker security", "vulnerability check", "security review".
tools
Run supply chain security scans before installing packages or before releases. Triggers when: user installs a package (npm, pip, go get, brew), user asks to 'scan dependencies', 'check vulnerabilities', 'supply chain', 'security audit', 'run trivy', 'run socket', or before any release/deployment. Also triggers on mentions of: socket.dev, trivy, OSV-scanner, dotenvx, CVE, dependency audit. Covers all four tools with concrete commands.
tools
Create GitHub releases following project conventions. Triggers when: user says 'release', 'create release', 'push release', 'deploy to main', 'merge to main', user merges a PR to main, or when git push to main is detected. Also triggers on mentions of: gh release, semver, version bump, changelog, release-please. Covers: config-driven (read .release.yml and execute) and fallback (gh CLI) release flows, versioning rules, tag management, and the mandatory release-on-merge convention.