skills/recipes/test-cli-usability/SKILL.md
Write scenario tests that verify your CLI tool is usable by AI agents. Ensures commands work non-interactively, provide clear output, and don't hang on prompts. Use when you want to prove your CLI is agent-friendly.
npx skillsauth add langwatch/langwatch test-cli-usabilityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This recipe helps you write scenario tests that verify your CLI tool works well when operated by AI agents (Claude Code, Cursor, Codex, etc.). A CLI that's agent-friendly means:
--help works on every subcommand)Install the Scenario SDK:
npm install @langwatch/scenario vitest @ai-sdk/openai
# or: pip install langwatch-scenario pytest
List every command your CLI supports. For each, note:
For each command, write a scenario test where an AI agent discovers and uses it:
import scenario, { type AgentAdapter, AgentRole } from "@langwatch/scenario";
import { openai } from "@ai-sdk/openai";
const myAgent: AgentAdapter = {
role: AgentRole.AGENT,
call: async (input) => {
// Your Claude Code adapter here
},
};
const result = await scenario.run({
name: "CLI command discovery",
description: "Agent discovers and uses the CLI to accomplish a task",
agents: [
myAgent,
scenario.userSimulatorAgent({ model: openai("gpt-5-mini") }),
scenario.judgeAgent({
model: openai("gpt-5-mini"),
criteria: [
"Agent used the CLI command correctly",
"Agent did not get stuck on interactive prompts",
"Agent did not need to pipe 'yes' or use 'expect' scripting",
],
}),
],
});
Add this assertion to every test:
function assertNoInteractiveWorkarounds(state) {
const output = state.messages.map(m =>
typeof m.content === 'string' ? m.content : JSON.stringify(m.content)
).join('\n');
expect(output).not.toMatch(/echo\s+["']?[yY](?:es)?["']?\s*\|/);
expect(output).not.toMatch(/\byes\s*\|/);
expect(output).not.toMatch(/expect\s+-c/);
expect(output).not.toMatch(/printf\s+["']\\n["']\s*\|/);
}
If this assertion fails, your CLI has an interactivity bug -- add --yes, --force, or --non-interactive flags to the offending commands.
Write scenarios where the agent makes a mistake and must recover:
--help and self-corrects--yes or --force flag--help comprehensive on every subcommanddevelopment
Add LangWatch tracing and observability to your code. Use for both onboarding (instrument an entire codebase) and targeted operations (add tracing to a specific function or module). Supports Python and TypeScript with all major frameworks.
tools
Test your AI agent with simulation-based scenarios. Covers writing scenario test code (Scenario SDK), creating platform scenarios (CLI or MCP), and red teaming for security vulnerabilities. Auto-detects whether to use code or platform approach based on context.
testing
Test that your AI agent stays observational and doesn't give prescriptive advice in regulated domains (healthcare, finance, legal). Creates scenario tests for boundary enforcement and red team tests for adversarial probing. Use when your agent advises but must not prescribe.
development
Expert AI engineering consultant for your LangWatch setup. Audits your codebase, traces, evaluations, and scenarios, then guides you to improve — starting from low-hanging fruit and going deeper. Use when you want to level up your agent's engineering quality.