skills/run-eval/SKILL.md
Run evaluations against a Copilot Studio agent via the Power Platform Evaluation API. Works on DRAFT agents — no publish step required. Lists test sets, starts a run, polls until complete, fetches results, and proposes YAML fixes for failures. Use when the user wants to test agent changes without publishing.
npx skillsauth add microsoft/skills-for-copilot-studio skills/run-evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run evaluations against a Copilot Studio agent's draft — no publish needed.
The caller (test agent) must provide --client-id and --workspace. If you don't have the client ID, return immediately and tell the caller to run test-auth first.
All eval-api commands run in the foreground. NEVER use run_in_background.
node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js list-testsets --workspace <path> --client-id <id>
You MUST ask this question and wait for the user's answer before starting the run.
Ask the user:
Does your agent use authenticated knowledge sources or connector actions (tools) that require user identity? If so, you'll need to provide a connection ID — without it, the eval runs anonymously and tools and knowledge sources will not be used.
How to obtain the connection ID:
- Go to https://make.powerautomate.com
- Open Connections from the side menu
- Select the relevant Microsoft Copilot Studio connection
- Copy the connection ID from the URL (the GUID segment after
/connections/)If your agent doesn't use authenticated knowledge or tools, you can skip this.
Do not proceed to Step 3 until the user responds.
node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js start-run --workspace <path> --client-id <id> --testset-id <id> --run-name "Draft eval <date>"
Add --connection-id <id> if the user provided a connection ID in Step 2.
Add --published only if the user explicitly asked for published-bot testing.
node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-run --workspace <path> --client-id <id> --run-id <runId>
Poll every 15-30 seconds. Report progress: "Processing: 3/10 test cases..."
Stop when state is Completed, Failed, Abandoned, or Cancelled.
node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-results --workspace <path> --client-id <id> --run-id <runId>
Present a summary table (total, passed, failed, errors). For failures:
| Metric | What to check |
|--------|---------------|
| GeneralQuality Fail | Which of relevance/completeness/groundedness/abstention failed |
| ExactMatch Fail | Score 0.0–1.0 |
| CapabilityUse Fail | missingInvocationSteps |
| Error status | errorReason — often a test set config issue, not a YAML issue |
For YAML authoring failures: find the relevant topic, read it, propose specific edits. Wait for user approval before applying.
After applying: offer to push and re-run (go back to Step 3).
testing
Validate Copilot Studio agent YAML files using the LSP binary's full diagnostics (YAML structure, Power Fx, schema, cross-file references). Use when the user asks to check, validate, or verify YAML files.
development
Authenticate for Copilot Studio evaluation API and SDK chat. Caches a token that is shared across run-eval and chat-sdk skills. Run this before any eval or SDK chat workflow. Requires an App Registration with MakerOperations and Copilots.Invoke permissions.
development
Run a batch test suite via the Copilot Studio Kit (Dataverse API). Uses the Power CAT Copilot Studio Kit to execute test cases against a published agent and produces pass/fail results with latencies. Requires the Kit installed in the environment, an App Registration with Dataverse permissions, and a published agent.
development
Index of repeatable implementation patterns for Copilot Studio agents. When a request may need a best-practice architecture or reusable pattern for building an agent capability, retrieve this index before deciding what detailed guidance is relevant. Do not decide from this frontmatter alone; use the index summaries, then open only the specific pattern file if needed. Do not use for general knowledge sources or topic creation.