.claude/skills/agent-evaluation/SKILL.md
Browser automation CLI for AI agents. Use for website interaction, form automation, screenshots, scraping, and web app verification. Prefer snapshot refs (@e1, @e2) for deterministic actions.
npx skillsauth add wallacedobbs428/thecalltaker agent-evaluationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Always use the deterministic ref loop:
agent-browser open <url>agent-browser snapshot -i@e1, @e2, ...)agent-browser snapshot -i again after page/DOM changesagent-browser open https://example.com/form
agent-browser wait --load networkidle
agent-browser snapshot -i
agent-browser fill @e1 "[email protected]"
agent-browser click @e2
agent-browser snapshot -i
Use && chaining when intermediate output is not needed.
# Good chaining: open -> wait -> snapshot
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
# Separate calls when output is needed first
agent-browser snapshot -i
# parse refs
agent-browser click @e2
High-value commands:
open, closesnapshot -i, snapshot -i -C, snapshot -s "#selector"click, fill, type, select, check, pressdiff snapshot, diff screenshot --baseline <file>screenshot, screenshot --annotate, pdfwait --load networkidle, wait <selector|@ref|ms>Use explicit evidence after actions.
# Baseline -> action -> verify structure
agent-browser snapshot -i
agent-browser click @e3
agent-browser diff snapshot
# Visual regression
agent-browser screenshot baseline.png
agent-browser click @e5
agent-browser diff screenshot --baseline baseline.png
wait --load networkidle or selector/ref waits over fixed sleeps.eval --stdin (or base64) to avoid shell escaping breakage.--session <name>.Optional hardening examples:
# Wrap page content with boundaries to reduce prompt-injection risk
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
# Limit output volume for long pages
export AGENT_BROWSER_MAX_OUTPUT=50000
# Restrict navigation and network to trusted domains
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
# Restrict allowed action types
export AGENT_BROWSER_ACTION_POLICY=./policy.json
Example policy.json:
{"default":"deny","allow":["navigate","snapshot","click","fill","scroll","wait","get"],"deny":["eval","download","upload","network","state"]}
CLI-flag equivalent:
agent-browser --content-boundaries --max-output 50000 --allowed-domains "example.com,*.example.com" --action-policy ./policy.json open https://example.com
command not found: install and run agent-browser install.snapshot -i again and use fresh refs.--load networkidle or targeted wait selector.--session names and close each session.-i, -c, -d, -s) and extract only needed text.Deep-dive docs in this skill:
Related resources:
Ready templates:
./templates/form-automation.sh./templates/capture-workflow.shdocumentation
Agentic memory system for writers - track characters, relationships, scenes, and themes
tools
Automate repetitive development tasks and workflows. Use when creating build scripts, automating deployments, or setting up development workflows. Handles npm scripts, Makefile, GitHub Actions workflows, and task automation.
development
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices". Fetches latest Vercel guidelines and checks files against all rules.
development
Implement web accessibility (a11y) standards following WCAG 2.1 guidelines. Use when building accessible UIs, fixing accessibility issues, or ensuring compliance with disability standards. Handles ARIA attributes, keyboard navigation, screen readers, semantic HTML, and accessibility testing.