Agent Tools

Important: Use Scripts First

ALWAYS prefer the scripts in scripts/ over raw curl API calls. Scripts are located in the scripts/ subdirectory of this skill's folder. They provide features that raw commands do not:

Proper image encoding (WebP conversion, alpha removal)
Appropriate model selection for each task
Structured output handling (boolean responses via exit codes)
Meaningful exit codes for shell integration
Formatting structured data (like GitHub PRs) into LLM-friendly Markdown

When to read the script source: If a script doesn't do exactly what you need, or fails due to missing dependencies, read the script source. The scripts encode API best practices (image ordering, structured output schemas, model selection) that may not be obvious—use them as reference when building similar functionality.

Quick Start

Environment: AI commands require a Gemini API key (reads from GEMINI_API_KEY). Scripts will report clear errors if no key is found. gh-markdown optionally accepts a --token for GitHub API access.

Model selection: Every Gemini-backed script accepts --model MODEL and honors the GEMINI_MODEL environment variable (--model wins; each script's built-in default applies when neither is set). Defaults vary per tool and are tuned for its task; only override when you have a reason.

Dependencies: curl, jq, uv (all tools); base64, magick (image tools only)

# Gather context and analyze
scripts/context show gemini-api | scripts/emerson "Explain the key features"

# Fetch a GitHub PR, Issue, or Workflow Run as Markdown
scripts/gh-markdown https://github.com/owner/repo/pull/123

# Describe an image (generate alt-text)
scripts/screenshot-describe screenshot.png

# Compare two images for visual differences
scripts/screenshot-compare before.png after.png

# Smart crop image around the detected primary subject
scripts/photo-smart-crop photo.jpg cropped.jpg

# Check if a photo prominently features people (exit code = answer)
scripts/photo-query @people photo.jpg

# Generic image query with a JSON schema
scripts/photo-query "Is there a fireplace?" \
  --schema "has_fireplace bool" photo.jpg

# Generate essay-length analysis from text
scripts/emerson "Summarize the key changes" < documentation.md

# Evaluate a boolean condition against text
echo "Hello world" | scripts/satisfies "is a greeting"

# Count tokens in text
cat document.md | scripts/token-count

# Interact with an Android UI via AI
scripts/popper "start an exercise"

Script Overview

oracle

Consult the Oracle for a very carefully researched and considered answer. The Oracle utilizes deep reasoning and Google Search grounding to provide the highest quality response possible. It accepts arbitrary files and directories as positional arguments, recursively walks directories, and automatically uploads media files. Use this tool for deep research, complex architectural reasoning, and synthesis requiring external data or massive repository context.

Important Usage Guidelines:

Not for Quick Q&A: The Oracle is designed for deep, context-heavy reasoning. It takes longer to run and consumes more tokens than standard tools. Do not use it for simple questions or basic syntax lookups.
Self-Contained Prompts: Write the prompt as if explaining the problem to an expert who has zero prior knowledge of your task, because the Oracle has no memory of your current session or previous steps. Do not use references like "the solution we implemented" without explaining exactly what that solution was.
Broad File Context: Include source files and directories as positional arguments because the Oracle needs the broadest possible view of the codebase to reason effectively. Err on the side of providing too much context—including files, directories, or documentation even if you think they are only marginally relevant—so the Oracle can discover non-obvious connections.
Define the Meta-Context: Beyond raw files (code, PDFs, logs), the most effective Oracle queries explicitly define the "meta-context" in the prompt. Before calling the tool, package up your intent. Define the persona, the ultimate goals, the success criteria, constraints, desired format/style, and provide examples or assumptions. A large, detailed prompt is expected.
Describe What Didn't Work: If you are calling the Oracle because you or an agent is stuck—e.g., many approaches have been tried but have failed or been rejected—explicitly summarise those failed attempts in the prompt. Describe each approach, why it failed or was ruled out, and any error messages or constraints encountered. This prevents the Oracle from re-proposing the same dead ends and directs its reasoning toward genuinely novel solutions.
Planning Step: The Oracle tool processes massive, expensive context payloads. Before executing a live Oracle request, formulate your prompt and target directories, and run the tool using the --dry-run flag: scripts/oracle --dry-run "PROMPT" [FILE_OR_DIR ...] Present the resulting dry-run summary (the total payload size, the list of resolved files, and your drafted prompt) to the user in the chat. Ask the user if they want to add more directories, exclude specific files, or tweak the focus of the prompt. Proceed with the live command (without --dry-run) after the user approves the plan.

Warning: Output can be detailed and lengthy.

scripts/oracle [OPTIONS] "PROMPT" [FILE_OR_DIR ...]

Options:

--force: Bypass context size limits (1MB for text, 20MB per media file). Use when you are confident the large context is necessary and the model can handle it.
--maps: Use Google Maps grounding instead of Google Search. Use this for queries about locations, places, or general routing options. Warning: Specific details like live star ratings, current operating hours, or recent business closures may still be inaccurate or outdated and should be verified. Note: Cannot be combined with --code.
--code: Enable Code Execution for Python. Use this whenever the task requires precise calculations, complex mathematics, data analysis on provided files, or programmatic logic. The model will write and execute Python code in a sandboxed environment.

Environment: GEMINI_API_KEY (Required)

Exit codes: 0 success, 1 error

Examples:

# Evaluate an architectural pattern
scripts/oracle "Evaluate this implementation against solid principles and propose a refactoring plan." src/

# Time-sensitive research based on context
scripts/oracle "What are the latest developments in this framework as of May 2026?" framework-docs.md

gh-markdown

Fetch GitHub Pull Requests, Issues, or Workflow Runs and format them as Markdown for LLM Agents.

Features:

PRs: Includes main description, comments, reviews, inline threads, and links to workflow runs for monitoring CI status.
Issues: Includes title, description, labels, and comments.
Workflow Runs: Includes run summary, duration, jobs, steps, and logs for failed jobs.

Requires GITHUB_TOKEN environment variable to be set with a GitHub Personal Access Token.

Token Setup: You can generate a token at https://github.com/settings/personal-access-tokens.

Minimum requirement for public repos: Select Repository access: Read-only access to public repositories with Permissions: None.
For private repos: Grant read access to Pull Requests, Issues, and Actions as needed.

scripts/gh-markdown URL

Environment: GITHUB_TOKEN (Required)

Exit codes: 0 success, 1 error

Examples:

# Fetch a PR
scripts/gh-markdown https://github.com/owner/repo/pull/123

# Fetch a Workflow Run
scripts/gh-markdown https://github.com/owner/repo/actions/runs/12345678

context

Gathers the very latest, authoritative, up-to-date context for deep research on various technical subjects (e.g., gemini-api, mcp, home-assistant) or arbitrary GitHub directories. Run context catalog to see all available entries. This script should be your first tool for gathering background knowledge or the latest documentation for an unfamiliar domain. Supports passing a full GitHub URL as a target (e.g., https://github.com/owner/repo/tree/branch/path).

Warning: Output can be very large. Do not read output directly into your conversation history. Pipe to emerson for analysis, or redirect to a file to search/read locally.

scripts/context show TARGET

Commands: catalog (list available entries), show (show context for target), template (output plugin template)

Exit codes: 0 success, 1 error, 127 missing dependency

Examples:

# List available catalog entries
scripts/context catalog

# Gather context for Gemini API
scripts/context show gemini-api > gemini-context.xml

# Pipe context directly to analysis
scripts/context show gemini-cli | scripts/emerson "How do commands work?"

screenshot-describe

Generate concise alt-text for an image. Optimized for UI captures.

scripts/screenshot-describe IMAGE [PROMPT]

Exit codes: 0 success, 1 error, 127 missing dependency

screenshot-compare

Compare two images for visual differences. Identifies layout shifts, color changes, padding, and text updates.

scripts/screenshot-compare IMAGE1 IMAGE2 [PROMPT]

Exit codes: 0 differences found, 1 error (including missing ImageMagick), 2 images identical

photo-smart-crop

Smart crop images around the detected primary subject (people, food, focal points in a landscape) with a specified aspect ratio. Centers the maximal crop box on the subject and enforces the aspect ratio. If no specific focal point is found, crops around the central compositional area.

scripts/photo-smart-crop [--ratio W:H] INPUT OUTPUT

Options: --ratio W:H (default 5:3)

Exit codes: 0 success, 1 error (API error, invalid arguments), 2 rate limited, 127 missing dependency

Examples:

# Default 5:3 aspect ratio
scripts/photo-smart-crop family.jpg family-cropped.jpg

# 16:9 for video thumbnails
scripts/photo-smart-crop --ratio 16:9 portrait.jpg thumbnail.jpg

# Square crop for profile pictures
scripts/photo-smart-crop --ratio 1:1 headshot.png avatar.png

photo-query

Ask Gemini a question about one or more photos. The QUERY positional is either an @-prefixed built-in or a free-form prompt:

@people — boolean: do people feature prominently? Single-file mode encodes the answer in the exit code (0 true / 1 false / 2 error); stdout silent; -v echoes true/false to stderr. Defaults to a 384px resize (single-tile token cost).
Any free-form text is sent as the prompt. Add --schema SPEC (llm-style DSL like 'has_bed bool, count int') for structured output and --filter FIELD to print only paths whose boolean field is true.

Multiple files (or non-boolean queries) emit per-file lines on stdout; exit code only reflects success/failure.

Default model is gemini-3.1-flash-lite — the cheapest/fastest Gemini 3 tier, appropriate for high-volume classification and lightweight visual Q&A. Override with --model for harder questions.

Deterministic image prep (EXIF rotate, alpha flatten, resize to --max-size (default 768, 384 for @people), WebP encode) is content-addressed-cached at ~/.cache/agent-tools/photo-query/ so repeated queries against the same images skip the resize entirely. Use --no-cache to bypass.

# Boolean check (exit code idiom)
if scripts/photo-query @people photo.jpg; then echo "Found people"; fi

# Multi-file boolean: per-line `<path>\t<true|false>` on stdout
scripts/photo-query @people *.jpg

# Schema-constrained query with filter
scripts/photo-query --recursive \
  --schema "has_bedside_table bool" \
  --filter has_bedside_table \
  "Does this image feature a bedside table?" \
  ./photos/

# Free-text description per file
scripts/photo-query "Describe the scene in under 200 chars." room.jpg

Exit codes: 0 success (or true for single-file boolean), 1 false (only for single-file boolean), 2 error (network, parse, missing file).

emerson

Generate essay-length (~3000 words) analysis from text input. Produces authoritative, footnoted Markdown. Operates as a strict, sandboxed tool that relies entirely on the provided standard input (stdin). It performs closed-book analysis without external search and acts as an elite technical analyst instructed to treat the input as the sole source of truth to prevent hallucination. Use this tool when you need summarization or formatting of specific, pre-gathered text. Can be combined with context to provide rich background material.

scripts/emerson "PROMPT" < input.txt

Exit codes: 0 success, 1 error, 127 missing dependency

pascal

Ask a question and get a short, paragraph-style response (wrapped to 80 columns). Optimized for quick answers.

scripts/pascal [-] "QUESTION"

Input: Optional context via stdin. Pass - as the first argument to read it; without -, stdin is ignored.

Exit codes: 0 success, 1 error, 127 missing dependency

Examples:

# Ask a quick question
scripts/pascal "What is the capital of Peru?"

# Summarize a file
cat article.md | scripts/pascal - "Summarize this article"

# Explain code
scripts/pascal - "Explain this code" < script.sh

satisfies

Evaluate whether input text satisfies a condition. Returns boolean via exit code.

echo "text" | scripts/satisfies [-v|--verbose] "CONDITION"

Options: -v, --verbose (output "true" or "false" to stderr)

Exit codes: 0 true (satisfies), 1 false (does not satisfy), 127 missing dependency

Examples:

# Check if file mentions a topic
cat file.txt | scripts/satisfies "mentions Elvis" && echo "Found it"

# Validate content type
cat response.json | scripts/satisfies "is valid JSON with an 'id' field"

# Use in conditionals
if cat log.txt | scripts/satisfies "contains error messages"; then
  echo "Errors detected"
fi

token-count

Count tokens in text using the Gemini API.

cat file.txt | scripts/token-count

Exit codes: 0 success, 1 error, 127 missing dependency

gemini-api-doctor

Ping Gemini models to test API key validity and endpoint responsiveness. Runs checks in parallel and enforces a 60-second timeout.

scripts/gemini-api-doctor [MODELS...]

Input:

stdin: API Key (if not set in environment).

Environment: GEMINI_API_KEY (Optional. Used if set, otherwise reads from stdin)

Options:

--help: Display help message.

Examples:

echo "YOUR_API_KEY" | scripts/gemini-api-doctor
scripts/gemini-api-doctor gemini-3.1-flash-lite

Exit codes: 0 success, 1 error

popper

Interact with Android UIs using an AI agent powered by uiautomator2 and Gemini. This allows semantic control of the device by providing a goal in natural language. Screenshots are captured at each step and saved to a unique run directory in an XDG-compliant temporary location.

scripts/popper "GOAL"

Options: --launch PACKAGE (launch a package before starting), --stay-in-app (restrict the run to a single application package), --dump-layout (print the current simplified UI layout as JSON and exit), --agent-screenshots / --no-agent-screenshots (enable/disable sending screenshots to API), --local-screenshots / --no-local-screenshots (enable/disable saving screenshots locally), --screenshot-dir DIR (override directory to save screenshots)

Environment: ANDROID_SERIAL (optional, target specific device)

Exit codes: 0 success (task completed), 1 error (task failed), 2 timeout

Examples:

# General UI task
scripts/popper "accept all permissions"

# Launch an app and keep the run inside it
scripts/popper --launch com.example.fitness --stay-in-app "start a running exercise"

# Target specific device
env ANDROID_SERIAL=12345 scripts/popper "open settings"

Image Encoding Notes

Screenshot tools encode to lossless WebP; photo-query uses lossy WebP and photo-smart-crop uses HEIF (both resize first to limit token cost)
Alpha handling varies by tool: screenshot-describe drops the alpha channel (-alpha off); screenshot-compare flattens onto a magenta background, so transparency differences show up in comparisons; photo-query flattens onto white
Base64: use -w 0 (Linux) or -b 0 (macOS) for single-line output
Single-image prompts: image before text (Gemini best practice)
Multi-image comparison: text before images (Gemini best practice)

Safety Notes

Scripts require network access to the Gemini API
Requires a Gemini API key (reads from GEMINI_API_KEY)
API calls may incur usage costs
Large images increase request size and latency
Scripts do not store or log input data

Reference Material

Command Reference: Detailed documentation for each script. See references/command-index.md.
Troubleshooting: Common issues and solutions. See references/troubleshooting.md.
Agent Function Notation (AFN): A notation for describing agent behaviour as functions. See references/afn.md.
Software Installation: How to install optional CLI tools such as codex, claude, and agy. See references/software-installation.md.

Agent Tools

Important: Use Scripts First

ALWAYS prefer the scripts in scripts/ over raw curl API calls. Scripts are located in the scripts/ subdirectory of this skill's folder. They provide features that raw commands do not:

Proper image encoding (WebP conversion, alpha removal)
Appropriate model selection for each task
Structured output handling (boolean responses via exit codes)
Meaningful exit codes for shell integration
Formatting structured data (like GitHub PRs) into LLM-friendly Markdown

Quick Start

Dependencies: curl, jq, uv (all tools); base64, magick (image tools only)

# Gather context and analyze
scripts/context show gemini-api | scripts/emerson "Explain the key features"

# Fetch a GitHub PR, Issue, or Workflow Run as Markdown
scripts/gh-markdown https://github.com/owner/repo/pull/123

# Describe an image (generate alt-text)
scripts/screenshot-describe screenshot.png

# Compare two images for visual differences
scripts/screenshot-compare before.png after.png

# Smart crop image around the detected primary subject
scripts/photo-smart-crop photo.jpg cropped.jpg

# Check if a photo prominently features people (exit code = answer)
scripts/photo-query @people photo.jpg

# Generic image query with a JSON schema
scripts/photo-query "Is there a fireplace?" \
  --schema "has_fireplace bool" photo.jpg

# Generate essay-length analysis from text
scripts/emerson "Summarize the key changes" < documentation.md

# Evaluate a boolean condition against text
echo "Hello world" | scripts/satisfies "is a greeting"

# Count tokens in text
cat document.md | scripts/token-count

# Interact with an Android UI via AI
scripts/popper "start an exercise"

Script Overview

oracle

Important Usage Guidelines:

Not for Quick Q&A: The Oracle is designed for deep, context-heavy reasoning. It takes longer to run and consumes more tokens than standard tools. Do not use it for simple questions or basic syntax lookups.
Self-Contained Prompts: Write the prompt as if explaining the problem to an expert who has zero prior knowledge of your task, because the Oracle has no memory of your current session or previous steps. Do not use references like "the solution we implemented" without explaining exactly what that solution was.
Broad File Context: Include source files and directories as positional arguments because the Oracle needs the broadest possible view of the codebase to reason effectively. Err on the side of providing too much context—including files, directories, or documentation even if you think they are only marginally relevant—so the Oracle can discover non-obvious connections.
Define the Meta-Context: Beyond raw files (code, PDFs, logs), the most effective Oracle queries explicitly define the "meta-context" in the prompt. Before calling the tool, package up your intent. Define the persona, the ultimate goals, the success criteria, constraints, desired format/style, and provide examples or assumptions. A large, detailed prompt is expected.
Describe What Didn't Work: If you are calling the Oracle because you or an agent is stuck—e.g., many approaches have been tried but have failed or been rejected—explicitly summarise those failed attempts in the prompt. Describe each approach, why it failed or was ruled out, and any error messages or constraints encountered. This prevents the Oracle from re-proposing the same dead ends and directs its reasoning toward genuinely novel solutions.
Planning Step: The Oracle tool processes massive, expensive context payloads. Before executing a live Oracle request, formulate your prompt and target directories, and run the tool using the --dry-run flag: scripts/oracle --dry-run "PROMPT" [FILE_OR_DIR ...] Present the resulting dry-run summary (the total payload size, the list of resolved files, and your drafted prompt) to the user in the chat. Ask the user if they want to add more directories, exclude specific files, or tweak the focus of the prompt. Proceed with the live command (without --dry-run) after the user approves the plan.

Warning: Output can be detailed and lengthy.

scripts/oracle [OPTIONS] "PROMPT" [FILE_OR_DIR ...]

Options:

--force: Bypass context size limits (1MB for text, 20MB per media file). Use when you are confident the large context is necessary and the model can handle it.
--maps: Use Google Maps grounding instead of Google Search. Use this for queries about locations, places, or general routing options. Warning: Specific details like live star ratings, current operating hours, or recent business closures may still be inaccurate or outdated and should be verified. Note: Cannot be combined with --code.
--code: Enable Code Execution for Python. Use this whenever the task requires precise calculations, complex mathematics, data analysis on provided files, or programmatic logic. The model will write and execute Python code in a sandboxed environment.

Environment: GEMINI_API_KEY (Required)

Exit codes: 0 success, 1 error

Examples:

# Evaluate an architectural pattern
scripts/oracle "Evaluate this implementation against solid principles and propose a refactoring plan." src/

# Time-sensitive research based on context
scripts/oracle "What are the latest developments in this framework as of May 2026?" framework-docs.md

gh-markdown

Fetch GitHub Pull Requests, Issues, or Workflow Runs and format them as Markdown for LLM Agents.

Features:

PRs: Includes main description, comments, reviews, inline threads, and links to workflow runs for monitoring CI status.
Issues: Includes title, description, labels, and comments.
Workflow Runs: Includes run summary, duration, jobs, steps, and logs for failed jobs.

Requires GITHUB_TOKEN environment variable to be set with a GitHub Personal Access Token.

Token Setup: You can generate a token at https://github.com/settings/personal-access-tokens.

Minimum requirement for public repos: Select Repository access: Read-only access to public repositories with Permissions: None.
For private repos: Grant read access to Pull Requests, Issues, and Actions as needed.

scripts/gh-markdown URL

Environment: GITHUB_TOKEN (Required)

Exit codes: 0 success, 1 error

Examples:

# Fetch a PR
scripts/gh-markdown https://github.com/owner/repo/pull/123

# Fetch a Workflow Run
scripts/gh-markdown https://github.com/owner/repo/actions/runs/12345678

context

Warning: Output can be very large. Do not read output directly into your conversation history. Pipe to emerson for analysis, or redirect to a file to search/read locally.

scripts/context show TARGET

Commands: catalog (list available entries), show (show context for target), template (output plugin template)

Exit codes: 0 success, 1 error, 127 missing dependency

Examples:

# List available catalog entries
scripts/context catalog

# Gather context for Gemini API
scripts/context show gemini-api > gemini-context.xml

# Pipe context directly to analysis
scripts/context show gemini-cli | scripts/emerson "How do commands work?"

screenshot-describe

Generate concise alt-text for an image. Optimized for UI captures.

scripts/screenshot-describe IMAGE [PROMPT]

Exit codes: 0 success, 1 error, 127 missing dependency

screenshot-compare

Compare two images for visual differences. Identifies layout shifts, color changes, padding, and text updates.

scripts/screenshot-compare IMAGE1 IMAGE2 [PROMPT]

Exit codes: 0 differences found, 1 error (including missing ImageMagick), 2 images identical

photo-smart-crop

scripts/photo-smart-crop [--ratio W:H] INPUT OUTPUT

Options: --ratio W:H (default 5:3)

Exit codes: 0 success, 1 error (API error, invalid arguments), 2 rate limited, 127 missing dependency

Examples:

# Default 5:3 aspect ratio
scripts/photo-smart-crop family.jpg family-cropped.jpg

# 16:9 for video thumbnails
scripts/photo-smart-crop --ratio 16:9 portrait.jpg thumbnail.jpg

# Square crop for profile pictures
scripts/photo-smart-crop --ratio 1:1 headshot.png avatar.png

photo-query

Ask Gemini a question about one or more photos. The QUERY positional is either an @-prefixed built-in or a free-form prompt:

@people — boolean: do people feature prominently? Single-file mode encodes the answer in the exit code (0 true / 1 false / 2 error); stdout silent; -v echoes true/false to stderr. Defaults to a 384px resize (single-tile token cost).
Any free-form text is sent as the prompt. Add --schema SPEC (llm-style DSL like 'has_bed bool, count int') for structured output and --filter FIELD to print only paths whose boolean field is true.

Multiple files (or non-boolean queries) emit per-file lines on stdout; exit code only reflects success/failure.

Default model is gemini-3.1-flash-lite — the cheapest/fastest Gemini 3 tier, appropriate for high-volume classification and lightweight visual Q&A. Override with --model for harder questions.

# Boolean check (exit code idiom)
if scripts/photo-query @people photo.jpg; then echo "Found people"; fi

# Multi-file boolean: per-line `<path>\t<true|false>` on stdout
scripts/photo-query @people *.jpg

# Schema-constrained query with filter
scripts/photo-query --recursive \
  --schema "has_bedside_table bool" \
  --filter has_bedside_table \
  "Does this image feature a bedside table?" \
  ./photos/

# Free-text description per file
scripts/photo-query "Describe the scene in under 200 chars." room.jpg

Exit codes: 0 success (or true for single-file boolean), 1 false (only for single-file boolean), 2 error (network, parse, missing file).

emerson

scripts/emerson "PROMPT" < input.txt

Exit codes: 0 success, 1 error, 127 missing dependency

pascal

Ask a question and get a short, paragraph-style response (wrapped to 80 columns). Optimized for quick answers.

scripts/pascal [-] "QUESTION"

Input: Optional context via stdin. Pass - as the first argument to read it; without -, stdin is ignored.

Exit codes: 0 success, 1 error, 127 missing dependency

Examples:

# Ask a quick question
scripts/pascal "What is the capital of Peru?"

# Summarize a file
cat article.md | scripts/pascal - "Summarize this article"

# Explain code
scripts/pascal - "Explain this code" < script.sh

satisfies

Evaluate whether input text satisfies a condition. Returns boolean via exit code.

echo "text" | scripts/satisfies [-v|--verbose] "CONDITION"

Options: -v, --verbose (output "true" or "false" to stderr)

Exit codes: 0 true (satisfies), 1 false (does not satisfy), 127 missing dependency

Examples:

# Check if file mentions a topic
cat file.txt | scripts/satisfies "mentions Elvis" && echo "Found it"

# Validate content type
cat response.json | scripts/satisfies "is valid JSON with an 'id' field"

# Use in conditionals
if cat log.txt | scripts/satisfies "contains error messages"; then
  echo "Errors detected"
fi

token-count

Count tokens in text using the Gemini API.

cat file.txt | scripts/token-count

Exit codes: 0 success, 1 error, 127 missing dependency

gemini-api-doctor

Ping Gemini models to test API key validity and endpoint responsiveness. Runs checks in parallel and enforces a 60-second timeout.

scripts/gemini-api-doctor [MODELS...]

Input:

stdin: API Key (if not set in environment).

Environment: GEMINI_API_KEY (Optional. Used if set, otherwise reads from stdin)

Options:

--help: Display help message.

Examples:

echo "YOUR_API_KEY" | scripts/gemini-api-doctor
scripts/gemini-api-doctor gemini-3.1-flash-lite

Exit codes: 0 success, 1 error

popper

scripts/popper "GOAL"

Environment: ANDROID_SERIAL (optional, target specific device)

Exit codes: 0 success (task completed), 1 error (task failed), 2 timeout

Examples:

# General UI task
scripts/popper "accept all permissions"

# Launch an app and keep the run inside it
scripts/popper --launch com.example.fitness --stay-in-app "start a running exercise"

# Target specific device
env ANDROID_SERIAL=12345 scripts/popper "open settings"

Image Encoding Notes

Screenshot tools encode to lossless WebP; photo-query uses lossy WebP and photo-smart-crop uses HEIF (both resize first to limit token cost)
Alpha handling varies by tool: screenshot-describe drops the alpha channel (-alpha off); screenshot-compare flattens onto a magenta background, so transparency differences show up in comparisons; photo-query flattens onto white
Base64: use -w 0 (Linux) or -b 0 (macOS) for single-line output
Single-image prompts: image before text (Gemini best practice)
Multi-image comparison: text before images (Gemini best practice)

Safety Notes

Scripts require network access to the Gemini API
Requires a Gemini API key (reads from GEMINI_API_KEY)
API calls may incur usage costs
Large images increase request size and latency
Scripts do not store or log input data

Reference Material

Command Reference: Detailed documentation for each script. See references/command-index.md.
Troubleshooting: Common issues and solutions. See references/troubleshooting.md.
Agent Function Notation (AFN): A notation for describing agent behaviour as functions. See references/afn.md.
Software Installation: How to install optional CLI tools such as codex, claude, and agy. See references/software-installation.md.

Adoption

ithinkihaveacat/agent-tools

$ install --global

Security Scan Results

SKILL.md

Agent Tools

Important: Use Scripts First

Quick Start

Script Overview

oracle

gh-markdown

context

screenshot-describe

screenshot-compare

photo-smart-crop

photo-query

emerson

pascal

satisfies

token-count

gemini-api-doctor

popper

Image Encoding Notes

Safety Notes

Reference Material

Related Skills

ithinkihaveacat/android-testing

ithinkihaveacat/workspace-config

ithinkihaveacat/technical-writing

ithinkihaveacat/interior-design

ithinkihaveacat/agent-tools

$ install --global

Security Scan Results

SKILL.md

Agent Tools

Important: Use Scripts First

Quick Start

Script Overview

oracle

gh-markdown

context

screenshot-describe

screenshot-compare

photo-smart-crop

photo-query

emerson

pascal

satisfies

token-count

gemini-api-doctor

popper

Image Encoding Notes

Safety Notes

Reference Material

Related Skills

ithinkihaveacat/android-testing

ithinkihaveacat/workspace-config

ithinkihaveacat/technical-writing

ithinkihaveacat/interior-design