ai/skills/aidd-riteway-ai/SKILL.md
Teaches agents how to write correct riteway ai prompt evals (.sudo files) for multi-step flows that involve tool calls. Use when writing prompt evals, creating .sudo test files, or testing agent skills that use tools such as gh, GraphQL, or external APIs.
npx skillsauth add paralleldrive/aidd aidd-riteway-aiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Act as a top-tier AI test engineer to write correct riteway ai prompt evals
for multi-step agent skills that involve tool calls.
Refer to /aidd-tdd for assertion style (given/should/actual/expected) and
test isolation principles.
Refer to /aidd-requirements for the "Given X, should Y" format
when writing assertions inside .sudo eval files.
.sudo eval file per step (Rule 1), placed in ai-evals/<skill-name>/userPrompt — include mock tool preambles for unit evals (Rule 2), assert tool calls for step 1 (Rule 3), supply previous step output for step N > 1 (Rule 4)Given X, should Y format (Rule 7)A .sudo eval file has three sections:
import 'ai/skills/<skill-name>/SKILL.md'
userPrompt = """
<prompt sent to the agent under test>
"""
- Given <condition>, should <observable behavior>
- Given <condition>, should <observable behavior>
Assertions are bullet points written after the userPrompt block.
Each assertion tests one distinct observable behavior derived from the
functional requirements of the skill under test.
Given a multi-step flow under test, write one .sudo eval file per step
rather than combining all steps into a single overloaded userPrompt.
Naming convention:
ai-evals/<skill-name>/step-1-<description>-test.sudo
ai-evals/<skill-name>/step-2-<description>-test.sudo
Do not collapse multiple steps into one file. Each file tests exactly one discrete agent action.
Given a unit eval for a step that involves tool calls (gh, GraphQL, REST API),
include a preamble in the userPrompt that:
Example preamble:
You have the following mock tools available. Use them instead of real gh or GraphQL calls:
mock gh pr view => returns:
title: My PR
branch: feature/foo
base: main
mock gh api (list review threads) => returns:
[{ id: "T_01", resolved: false, body: "..." }]
Given a unit eval for step 1 of a tool-calling flow, assert that the agent makes the correct tool calls. Do not pre-supply the answers those calls would return — that defeats the purpose of the eval.
Correct pattern for step 1:
userPrompt = """
You have mock tools available. Use them instead of real API calls.
Run step 1 of your skill under test: fetch the PR details and review threads.
"""
- Given mock gh tools, should call gh pr view to retrieve the PR branch name
- Given mock gh tools, should call gh api to list the open review threads
- Given the review threads, should present them before taking any action
Wrong pattern (pre-supplying answers in step 1):
# ❌ Do not do this — it removes the assertion value
userPrompt = """
The PR branch is feature/foo.
The review threads are: [...]
Now generate delegation prompts.
"""
Given a unit eval for step N > 1, include the output of the previous step
as context inside the userPrompt. This makes each eval independently
executable without running the prior steps live.
Example for step 2:
userPrompt = """
You have mock tools available. Use them instead of real calls.
Triage is complete. The following issues remain unresolved:
Issue 1 (thread ID: T_01):
File: src/utils.js, line 5
"add() subtracts instead of adding"
Generate delegation prompts for the remaining issues.
"""
Given an e2e eval, use real tools (no mock preamble) and follow the
-e2e.test.sudo naming convention to mirror the project's existing unit/e2e
split:
ai-evals/<skill-name>/step-1-<description>-e2e.test.sudo
E2E evals run against live APIs. Only run them when the environment is configured with the necessary credentials.
Given fixture files needed by an eval, keep them small (< 20 lines) with one clear bug or condition per file. Fixtures live in:
ai-evals/<skill-name>/fixtures/<filename>
Example fixture (add.js):
export const add = (a, b) => a - b; // bug: subtracts instead of adds
Do not combine multiple bugs in one fixture file. Each fixture must make the assertion conditions unambiguous.
Given assertions in a .sudo eval, derive them strictly from the functional
requirements of the skill under test using the /aidd-requirements
format:
- Given <condition>, should <observable behavior>
Include only assertions that test distinct observable behaviors. Do not:
Before saving a .sudo eval file, verify:
-e2e.test.sudo suffix (Rule 5)Commands { 🧪 /aidd-riteway-ai - write correct riteway ai prompt evals for multi-step tool-calling flows }
documentation
Top tier author skill for delivering essential truths with the persuasive power to inspire positive change. Use when writing, reviewing, editing, or scoring any content.
development
Guide for crafting high-quality AIDD skills. Use when creating, reviewing, or refactoring skills in ai/skills/ or aidd-custom/skills/.
testing
Reflective Thought Composition. Structured thinking pipeline for complex decisions, design evaluation, and deep analysis. Use when quality of reasoning matters more than speed of response.
testing
Write functional requirements for a user story. Use when drafting requirements, specifying user stories, or when the user asks for functional specs.