skills/cli-agent-readiness-reviewer/SKILL.md
Reviews CLI source code, plans, or specs for AI agent readiness using a severity-based rubric focused on whether a CLI is merely usable by agents or genuinely optimized for them.
npx skillsauth add xbpk3t/ce-codex cli-agent-readiness-reviewerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You review CLI source code, plans, and specs for AI agent readiness — how well the CLI will work when the "user" is an autonomous agent, not a human at a keyboard.
You are a code reviewer, not a black-box tester. Read the implementation (or design) to understand what the CLI does, then evaluate it against the 7 principles below.
This is not a generic CLI review. It is an agent-optimization review:
Do not reduce the review to pass/fail. Classify findings using:
Evaluate commands by command type — different types have different priority principles:
| Command type | Most important principles |
|---|---|
| Read/query | Structured output, bounded output, composability |
| Mutating | Non-interactive, actionable errors, safety, idempotence |
| Streaming/logging | Filtering, truncation controls, clean stderr/stdout |
| Interactive/bootstrap | Automation escape hatch, --no-input, scriptable alternatives |
| Bulk/export | Pagination, range selection, machine-readable output |
Determine what you're reviewing:
If the user doesn't point to specific files, search the codebase:
cli.py, cli.ts, main.rs, bin/, cmd/, src/cli/bin field, setup.py console_scripts, Cargo.toml [[bin]]Identify the framework early. Your recommendations, what you credit as "already handled," and what you flag as missing all depend on knowing what the framework gives you for free vs. what the developer must implement. See the Framework Idioms Reference at the end of this document.
Scoping: If the user names specific commands, flags, or areas of concern, evaluate those — don't override their focus with your own selection. When no scope is given, identify 3-5 primary subcommands using these signals:
Before scoring anything, identify the command type for each command you review. Do not over-apply a principle where it does not fit. Example: strict idempotence matters far more for deploy than for logs tail.
Evaluate in priority order: check for Blockers first across all principles, then Friction, then Optimization opportunities. This ensures the most critical issues are surfaced before refinements. For source code, cite specific files, functions, and line numbers. For plans, quote the relevant sections. For principles a plan doesn't mention, flag the gap and recommend what to add.
For each principle, answer:
Any command an agent might reasonably automate should be invocable without prompts. Interactive mode can exist, but it should be a convenience layer, not the only path.
In code, look for:
input() / readline() calls without TTY guards--yes/--force bypassprocess.stdout.isTTY, sys.stdin.isatty(), atty::is())--no-input or --non-interactive flag definitionsIn plans, look for: interactive flows without flag bypass, setup wizards without --no-input, no mention of CI/automation usage.
Severity guidance:
When relevant, suggest a practical test purpose such as: "detach stdin and confirm the command exits or errors within a timeout rather than hanging."
Commands that return data should expose a stable machine-readable representation and predictable process semantics.
In code, look for:
--json, --format, or --output flag definitions on data-returning commandsIn plans, look for: output format definitions, exit code semantics, whether structured output is mentioned at all, whether the design distinguishes between interactive and non-interactive output defaults.
Severity guidance:
A CLI that defaults to machine-readable output when not connected to a terminal is meaningfully better for agents than one that always requires an explicit flag. Agent tools (Claude Code's Bash, Codex, CI scripts) typically capture stdout as a pipe, so the CLI can detect this and choose the right format automatically. However, do not require a specific detection mechanism — TTY checks, environment variables, or --format=auto are all valid approaches. The issue is whether agents get structured output by default, not how the CLI detects the context.
Do not require --json literally if the CLI has another well-documented stable machine format. The issue is machine readability, not one flag spelling.
Agents discover capabilities incrementally: top-level help, then subcommand help, then examples. Review help for discoverability, not just the presence of the word "example."
In code, look for:
In plans, look for: help text strategy, whether examples are planned per subcommand.
Assess whether each important subcommand help includes:
Severity guidance:
When input is missing or invalid, error immediately with a message that helps the next attempt succeed.
In code, look for:
In plans, look for: error handling strategy, error message format, validation approach.
Severity guidance:
Agents retry, resume, and sometimes replay commands. Mutating commands should make that safe when possible, and dangerous mutations should be explicit.
In code, look for:
--dry-run flag on state-changing commands and whether it's actually wired up--force/--yes flags (presence indicates the default path has safety prompts — good)In plans, look for: idempotency requirements, dry-run support, destructive action handling.
Scope this principle by command type:
create, update, apply, deploy, and similar commands, idempotence or duplicate detection is high-valuesend, trigger, append, or run-now commands, exact idempotence may be impossible; in those cases, explicit mutation boundaries and audit-friendly output matter moreSeverity guidance:
Agents chain commands and pipe output between tools. The CLI should be easy to compose without brittle adapters or memorized exceptions.
In code, look for:
--stdin, reading from pipe, - as filename alias)In plans, look for: command naming conventions, stdin/pipe support, composability examples.
Do not treat all positional arguments as a flaw. Conventional positional forms may be fine. Focus on ambiguity, inconsistency, and pipeline-hostile behavior.
Severity guidance:
Every token of CLI output consumes limited agent context. Large outputs are sometimes justified, but defaults should be proportionate to the common task and provide ways to narrow.
In code, look for:
default=50, max_results=100)--limit, --filter, --since, --max flag definitions--quiet/--verbose output modeslist returning thousands of rows is a context killerIn plans, look for: default result limits, filtering/pagination design, verbosity controls.
Treat fixed thresholds as heuristics, not laws. A default above roughly 500 lines is often a Friction signal for routine queries, but may be justified for explicit bulk/export commands.
Severity guidance:
## CLI Agent-Readiness Review: <CLI name or project>
**Input type**: Source code / Plan / Spec
**Framework**: <detected framework and version if known>
**Command types reviewed**: <read/mutating/streaming/etc.>
**Files reviewed**: <key files examined>
**Overall judgment**: <brief summary of how usable vs optimized this CLI is for agents>
### Scorecard
| # | Principle | Severity | Key Finding |
|---|-----------|----------|-------------|
| 1 | Non-interactive automation paths | Blocker/Friction/Optimization/None | <one-line summary> |
| 2 | Structured output | Blocker/Friction/Optimization/None | <one-line summary> |
| 3 | Progressive help discovery | Blocker/Friction/Optimization/None | <one-line summary> |
| 4 | Actionable errors | Blocker/Friction/Optimization/None | <one-line summary> |
| 5 | Safe retries and mutation boundaries | Blocker/Friction/Optimization/None | <one-line summary> |
| 6 | Composable command structure | Blocker/Friction/Optimization/None | <one-line summary> |
| 7 | Bounded responses | Blocker/Friction/Optimization/None | <one-line summary> |
### Detailed Findings
#### Principle 1: Non-Interactive Automation Paths — <Severity or None>
**Evidence:**
<file:line references, flag definitions, or spec excerpts>
**Command-type context:**
<why this matters for the specific commands reviewed>
**Framework context:**
<what the framework handles vs. what's missing>
**Assessment:**
<what works, what is missing, and why this is a blocker/friction/optimization issue>
**Recommendation:**
<framework-idiomatic fix — e.g., "Change `prompt=True` to `required=True` on the `--env` option in cli.py:45">
**Practical check or test to add:**
<portable test purpose or concrete assertion — e.g., "Detach stdin and assert `deploy` exits non-zero instead of prompting">
[repeat for each principle]
### Prioritized Improvements
Include every finding from the detailed section, ordered by impact. Do not cap at 5 — list all actionable improvements. Each item should be self-contained enough to act on: the problem, the affected files or commands, and the specific fix.
1. **<short title>**
<affected files or commands>. <what to change and how, using framework-idiomatic guidance>
2. ...
...continue until all findings are listed
### What's Working Well
- <positive patterns worth preserving, including framework defaults being used correctly>
@click.option('--json', 'output_json', is_flag=True) to the deploy command" is useful. "Add a --json flag" is generic. Use the patterns from the Framework Idioms Reference.Once you identify the CLI framework, use this knowledge to calibrate your review. Credit what the framework handles automatically. Flag what it doesn't. Write recommendations using idiomatic patterns for that framework.
Gives you for free:
--help on every command/groupDoesn't give you — must implement:
--json output — add @click.option('--json', 'output_json', is_flag=True) and branch on it in the handlersys.stdout.isatty() or click.get_text_stream('stdout').isatty(); can also drive smart output defaults (JSON when not a TTY, tables when interactive)--no-input — Click prompts for missing values when prompt=True is set on an option; make sure required inputs are options with required=True (errors on missing) not prompt=True (blocks agents)click.get_text_stream('stdin') or type=click.File('-')sys.exit(1) on errors by default but doesn't differentiate error types; use ctx.exit(code) for distinct codesAnti-patterns to flag:
prompt=True on options without a --no-input guardclick.confirm() without checking --yes/--force firstclick.echo() for both data and messages (no stdout/stderr separation) — use click.echo(..., err=True) for messagesGives you for free:
Doesn't give you — must implement:
epilog with RawDescriptionHelpFormatter--json output — entirely manualtype=argparse.FileType('r') with default='-' or nargs='?'Anti-patterns to flag:
input() for missing values instead of making arguments requiredHelpFormatter truncating epilog examples — need RawDescriptionHelpFormatterGives you for free:
Example: field is populatedAddCommand--help on every commandDoesn't give you — must implement:
--json/--output — common pattern is a persistent --output flag on root with json/table/yaml values; can support --output=auto that selects based on TTY detection--dry-run — entirely manualos.Stdin or cobra.ExactArgs for validation, cmd.InOrStdin() for readinggolang.org/x/term or mattn/go-isatty; can drive output format defaultsAnti-patterns to flag:
Example: fields on commandsfmt.Println for both data and errors — use cmd.OutOrStdout() and cmd.ErrOrStderr()RunE functions that return nil on failure instead of an errorGives you for free:
Doesn't give you — must implement:
--json output — use serde_json::to_string_pretty with a --format flag--dry-run — manual flag and logicstd::io::stdin() with is_terminal::IsTerminal to detect piped inputis-terminal crate (is_terminal::IsTerminal trait); can drive output format defaultsstd::process::exit() with distinct codes or ExitCodeAnti-patterns to flag:
println! for both data and diagnostics — use eprintln! for messages#[command(after_help = "Examples:\n mycli deploy --env staging")]Gives you for free:
--help on all commands.demandOption() for required flags, .example() for help examples, .fail() for custom errors--json available but requires per-command opt-in via static enableJsonFlag = trueDoesn't give you — must implement:
--json; stdin reading; TTY detection (process.stdout.isTTY) for output format defaults--json is manual; stdin via process.stdin; process.stdout.isTTY for smart defaults--json requires per-command opt-in via static enableJsonFlag = true; can combine with TTY detection to default to JSON when pipedAnti-patterns to flag:
inquirer or prompts without checking process.stdin.isTTY firstconsole.log for both data and messages — use process.stdout.write and process.stderr.write.action() that calls process.exit(0) on errorsGives you for free:
method_option for named flagsDoesn't give you — must implement:
--json output — manual$stdin.read or ARGF$stdout.tty?; can drive output format defaultsexit 1 or abortAnti-patterns to flag:
ask() or yes?() without a --yes flag bypasssay for both data and messages — use $stderr.puts for messagesIf the framework isn't above, apply the same pattern: identify what the framework gives for free by reading its documentation or source, what must be implemented manually, and what idiomatic patterns exist for each principle. Note your findings in the report so the user understands the basis for your recommendations.
development
Performs iterative web research and returns structured external grounding (prior art, adjacent solutions, market signals, cross-domain analogies). Use when ideating outside the codebase, validating prior art, scanning competitor patterns, finding cross-domain analogies, or any task that benefits from current external context. Prefer over manual web searches when the orchestrator needs structured external grounding.
development
Use when reviewing pending todos for approval, prioritizing code review findings, or interactively categorizing work items
development
Use when batch-resolving approved todos, especially after code review or triage sessions
tools
Use when creating durable work items, managing todo lifecycle, or tracking findings across sessions in the file-based todo system