skills/scaffold-agent-tests/SKILL.md
Generate an LLM agent test suite (golden cases, mock-LLM unit tests, evaluator harness) from an agent implementation and its agent-test contract. Use when an agent has no tests, or a contract exists but the test code is missing.
npx skillsauth add nesnilnehc/ai-cortex scaffold-agent-testsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
读取一个 LLM Agent 的实现与其测试契约(specs/agent-test-modeling.md 实例),生成可追溯到契约的测试套件——确定性部分的精确断言、非确定性部分的 oracle 测试、golden dataset 与 evaluator harness。
首要目标:产出一套遵循 rules/standards-agent-testing.md 的 agent 测试代码,每条断言可追溯到契约的某条。
成功标准(必须全部满足):
agent-test-contract 文档Covers 追溯锚eval marker,unit 管道用 mock LLMCovers 指向契约 ID 或上游 AC验收测试:开发者能否在不读 agent 源码的情况下,仅凭生成的测试 + 契约理解每条测试守护什么?
本技能负责:
Covers 追溯锚本技能不负责:
转交点:测试生成完毕 → 交 automate-tests 运行、review-testing 评审。
agent-test-contract 文档(frontmatter agent_ref 指向实现)将契约各章节映射为测试条目:
| 契约章节 | 测试类型 | oracle | |---|---|---| | 能力边界 | 正向 + 反向行为测试 | 轨迹 / 契约 | | 输入契约(缺字段识别) | 边界 + 异常测试 | 轨迹(断言追问、不写回) | | 工具调用边界 | 轨迹测试 | 轨迹(禁止集未出现) | | 写回前置 | 前置不满足时的拒绝测试 | 轨迹 | | Golden Cases | 回归测试集 | 按契约判定方式列 |
eval markerCoverspass_thresholdCovers、unit vs eval 分层Coverseval marker,unit 用 mock LLM用户:"为 clarification agent 生成测试。"
代理:
agent-test-clarification.md(契约)+ src/agents/clarification.pyask_user 不触发 write_requirementgolden/clarification.jsonl(3 条:缺验收 / 信息完整 / 空输入)pass_threshold: 0.9Covers: ACME-REQ-08#AC1/AC3;提示 automate-tests 运行development
After code changes, auto-detect the project's build system and local deployment method for a given directory, then build the project and restart its locally-deployed environment (Docker Compose / systemd / process manager). Never assumes — asks only when detection is ambiguous. Caches detected commands per project in .cortex/redeploy-local.yaml; re-invocations on the same project skip re-scanning until signal files change, the cache expires (30 days), or the skill version bumps.
tools
Publish a NATS message conforming to a cross-team contract, using NATS MCP tools. Authors the contract on first use if missing. Reads project-level cache (.cortex/nats.yaml) to avoid re-prompting basics across sessions.
tools
Drain pending NATS messages from a producer contract via NATS MCP tools (default batch / drain-style). Applies Tolerant Reader semantics and per-message ack/nak/term, returning aggregated stats. Reads project-level cache (.cortex/nats.yaml) to avoid re-prompting.
testing
Iteratively review changes, run automated tests, and apply targeted fixes until issues are resolved (or a stop condition is reached).