skills/run-tests-kit/SKILL.md
Run a batch test suite via the Copilot Studio Kit (Dataverse API). Uses the Power CAT Copilot Studio Kit to execute test cases against a published agent and produces pass/fail results with latencies. Requires the Kit installed in the environment, an App Registration with Dataverse permissions, and a published agent.
npx skillsauth add microsoft/skills-for-copilot-studio skills/run-tests-kitInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run a batch test suite against a published Copilot Studio agent using the Power CAT Copilot Studio Kit.
The user must have:
Read tests/settings.json (relative to the user's project CWD) and check for missing or placeholder values (containing YOUR_).
If the file doesn't exist, create it from the template:
cp ${CLAUDE_SKILL_DIR}/../../tests/settings-example.json ./tests/settings.json
If values are missing, ask the user for each missing value. Explain where to find each one:
dataverse.environmentUrl): "What is your Dataverse environment URL? Find it in Power Platform admin center or Copilot Studio > Settings > Session Details. It looks like https://orgXXXXXX.crm.dynamics.com"dataverse.tenantId): "What is your Azure tenant ID? Find it in Azure Portal > Microsoft Entra ID > Overview. It's a GUID like c87f36f7-fc65-453c-9019-0d724f21bc42"dataverse.clientId): "What is your App Registration client ID? Find it in Azure Portal > App Registrations > your app > Application (client) ID. It's a GUID."testRun.agentConfigurationId): "What is your agent configuration ID? In Copilot Studio, go to your agent > Tests tab. The ID is a GUID found in the URL or test configuration."testRun.agentTestSetId): "What is your test set ID? In Copilot Studio, go to your agent > Tests tab > select your test set. The ID is a GUID found in the URL."Ask for ALL missing values at once (don't ask one at a time).
Write tests/settings.json with the collected values:
{
"dataverse": {
"environmentUrl": "<value>",
"tenantId": "<value>",
"clientId": "<value>"
},
"testRun": {
"agentConfigurationId": "<value>",
"agentTestSetId": "<value>"
}
}
If all values are already configured and valid, proceed to Phase 2.
Ensure tests/package.json exists in the user's project. If not, copy it:
cp ${CLAUDE_SKILL_DIR}/../../tests/package.json ./tests/package.json
Install dependencies if tests/node_modules/ doesn't exist:
npm install --prefix tests
Run the test script in the background with a 100-minute timeout (6000000ms):
node ${CLAUDE_SKILL_DIR}/../../tests/run-tests.js --config-dir ./tests
Use run_in_background: true for this command. Save the returned task ID.
Wait 10 seconds, then check the background task output (non-blocking check).
Detect the authentication state from the output:
If the output contains "Using cached token": Authentication succeeded automatically. Tell the user: "Authentication successful (cached credentials). Tests are running, this may take several minutes..."
If the output contains "use a web browser to open the page": Extract the URL and device code from the message. Present this prominently to the user:
Authentication Required
Open your browser to: https://microsoft.com/devicelogin Enter the code: XXXXXXXXX (extract the actual code from the output)
After signing in, the tests will continue automatically.
If the output contains an error: Report the error to the user and stop.
If the output is empty or incomplete: Wait another 10 seconds and check again (retry up to 3 times).
Wait for the background task to complete (blocking). The script polls every 20 seconds until all tests finish and downloads results as a CSV.
Read the final output to get the success rate and CSV filename.
Proceed to Phase 3.
Get the results: Glob: tests/test-results-*.csv — read the most recent CSV file (newest by modification time).
Parse the CSV columns:
| Column | Meaning |
|--------|---------|
| Test Utterance | The user message that was tested |
| Expected Response | What the test expected |
| Response | What the agent actually responded |
| Latency (ms) | Response time |
| Result | Success, Failed, Unknown, Error, or Pending |
| Test Type | Response Match, Topic Match, Generative Answers, Multi-turn, Plan Validation, or Attachments |
| Result Reason | Why the test passed or failed |
Focus on failed tests (Result = Failed or Error). For each failure, analyze:
SendActivity messages, instructions, or generative answer config.SearchAndSummarizeContent, and agent instructions.Proceed to Phase 4 (Propose Fixes).
For each failure, identify the relevant YAML file(s):
Glob: **/agent.mcs.ymlPropose specific YAML changes to fix each failure. Present them to the user as a summary:
Wait for user decision. The user can:
Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running tests.
Result: 1=Success, 2=Failed, 3=Unknown, 4=Error, 5=Pending
Test Type: 1=Response Match, 2=Topic Match, 3=Attachments, 4=Generative Answers, 5=Multi-turn, 6=Plan Validation
Run Status: 1=Not Run, 2=Running, 3=Complete, 4=Not Available, 5=Pending, 6=Error
testing
Validate Copilot Studio agent YAML files using the LSP binary's full diagnostics (YAML structure, Power Fx, schema, cross-file references). Use when the user asks to check, validate, or verify YAML files.
development
Authenticate for Copilot Studio evaluation API and SDK chat. Caches a token that is shared across run-eval and chat-sdk skills. Run this before any eval or SDK chat workflow. Requires an App Registration with MakerOperations and Copilots.Invoke permissions.
development
Run evaluations against a Copilot Studio agent via the Power Platform Evaluation API. Works on DRAFT agents — no publish step required. Lists test sets, starts a run, polls until complete, fetches results, and proposes YAML fixes for failures. Use when the user wants to test agent changes without publishing.
development
Index of repeatable implementation patterns for Copilot Studio agents. When a request may need a best-practice architecture or reusable pattern for building an agent capability, retrieve this index before deciding what detailed guidance is relevant. Do not decide from this frontmatter alone; use the index summaries, then open only the specific pattern file if needed. Do not use for general knowledge sources or topic creation.