Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

microsoft/skills/run-eval

Name: skills/run-eval
Author: microsoft

skills/run-eval/SKILL.md

npx skillsauth add microsoft/skills-for-copilot-studio skills/run-eval

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Run Evaluation (PPAPI)

Run evaluations against a Copilot Studio agent's draft — no publish needed.

The caller (test agent) must provide --client-id and --workspace. If you don't have the client ID, return immediately and tell the caller to run test-auth first.

All eval-api commands run in the foreground. NEVER use run_in_background.

Step 1: List test sets and let the user choose

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js list-testsets --workspace <path> --client-id <id>

No test sets found: Tell the user to create one in Copilot Studio (Evaluate tab > New evaluation). Stop.
One test set: Tell the user which one you're using and proceed.
Multiple test sets: Show them all and ask the user to pick. Do not proceed until they answer.

Step 2: Ask about authenticated execution — MANDATORY, do not skip

You MUST ask this question and wait for the user's answer before starting the run.

Ask the user:

Does your agent use authenticated knowledge sources or connector actions (tools) that require user identity? If so, you'll need to provide a connection ID — without it, the eval runs anonymously and tools and knowledge sources will not be used.

How to obtain the connection ID:

Go to https://make.powerautomate.com

Open Connections from the side menu

Select the relevant Microsoft Copilot Studio connection

Copy the connection ID from the URL (the GUID segment after /connections/)

If your agent doesn't use authenticated knowledge or tools, you can skip this.

Do not proceed to Step 3 until the user responds.

Step 3: Start the run

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js start-run --workspace <path> --client-id <id> --testset-id <id> --run-name "Draft eval <date>"

Add --connection-id <id> if the user provided a connection ID in Step 2.

Add --published only if the user explicitly asked for published-bot testing.

Step 4: Poll until complete

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-run --workspace <path> --client-id <id> --run-id <runId>

Poll every 15-30 seconds. Report progress: "Processing: 3/10 test cases..."

Stop when state is Completed, Failed, Abandoned, or Cancelled.

Step 5: Fetch and analyze results

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-results --workspace <path> --client-id <id> --run-id <runId>

Present a summary table (total, passed, failed, errors). For failures:

| Metric | What to check | |--------|---------------| | GeneralQuality Fail | Which of relevance/completeness/groundedness/abstention failed | | ExactMatch Fail | Score 0.0–1.0 | | CapabilityUse Fail | missingInvocationSteps | | Error status | errorReason — often a test set config issue, not a YAML issue |

Step 6: Propose fixes (if failures found)

For YAML authoring failures: find the relevant topic, read it, propose specific edits. Wait for user approval before applying.

After applying: offer to push and re-run (go back to Step 3).

microsoft/skills/run-eval

skills/run-eval/SKILL.md

Run evaluations against a Copilot Studio agent via the Power Platform Evaluation API. Works on DRAFT agents — no publish step required. Lists test sets, starts a run, polls until complete, fetches results, and proposes YAML fixes for failures. Use when the user wants to test agent changes without publishing.

192 stars

development

Updated Apr 30, 2026

$ install --global

skillsauth

npx skillsauth add microsoft/skills-for-copilot-studio skills/run-eval

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 30, 2026, 7:04 AM13.7s1 file scanned

SKILL.md

user-invocable:: false
description:: >
allowed-tools:: Bash(node *eval-api.bundle.js *), Bash(node *manage-agent.bundle.js push *), Bash(node *manage-agent.bundle.js pull *), Read, Glob, Grep, Edit
context:: fork
agent:: copilot-studio-test

Run Evaluation (PPAPI)

Run evaluations against a Copilot Studio agent's draft — no publish needed.

The caller (test agent) must provide --client-id and --workspace. If you don't have the client ID, return immediately and tell the caller to run test-auth first.

All eval-api commands run in the foreground. NEVER use run_in_background.

Step 1: List test sets and let the user choose

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js list-testsets --workspace <path> --client-id <id>

No test sets found: Tell the user to create one in Copilot Studio (Evaluate tab > New evaluation). Stop.
One test set: Tell the user which one you're using and proceed.
Multiple test sets: Show them all and ask the user to pick. Do not proceed until they answer.

Step 2: Ask about authenticated execution — MANDATORY, do not skip

You MUST ask this question and wait for the user's answer before starting the run.

Ask the user:

Does your agent use authenticated knowledge sources or connector actions (tools) that require user identity? If so, you'll need to provide a connection ID — without it, the eval runs anonymously and tools and knowledge sources will not be used.

How to obtain the connection ID:

Go to https://make.powerautomate.com

Open Connections from the side menu

Select the relevant Microsoft Copilot Studio connection

Copy the connection ID from the URL (the GUID segment after /connections/)

If your agent doesn't use authenticated knowledge or tools, you can skip this.

Do not proceed to Step 3 until the user responds.

Step 3: Start the run

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js start-run --workspace <path> --client-id <id> --testset-id <id> --run-name "Draft eval <date>"

Add --connection-id <id> if the user provided a connection ID in Step 2.

Add --published only if the user explicitly asked for published-bot testing.

Step 4: Poll until complete

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-run --workspace <path> --client-id <id> --run-id <runId>

Poll every 15-30 seconds. Report progress: "Processing: 3/10 test cases..."

Stop when state is Completed, Failed, Abandoned, or Cancelled.

Step 5: Fetch and analyze results

node ${CLAUDE_SKILL_DIR}/../../scripts/eval-api.bundle.js get-results --workspace <path> --client-id <id> --run-id <runId>

Present a summary table (total, passed, failed, errors). For failures:

Step 6: Propose fixes (if failures found)

For YAML authoring failures: find the relevant topic, read it, propose specific edits. Wait for user approval before applying.

After applying: offer to push and re-run (go back to Step 3).

Related Skills

microsoft/int-patterns

development

VerifiedTrustedCommunity

Pattern library for Copilot Studio agent design. Contains proven and recommended implementation patterns with YAML examples. Used by the Advisor agent to suggest patterns and by the Author agent to reference YAML structure during implementation. USE FOR: design guidance, pattern suggestions, review against best practices, troubleshooting known pitfalls, YAML reference during implementation. DO NOT USE FOR: general YAML schema reference (use int-reference), topic creation mechanics (use new-topic).

378SKILL.mdUpdated Jul 26, 2026

microsoft/int-patterns

microsoft/skills/lookup-schema

testing

VerifiedTrustedCommunity

Look up Copilot Studio YAML schema definitions. Use when the user asks about schema structure, element properties, or how to use a specific YAML kind.

378SKILL.mdUpdated Apr 30, 2026

microsoft/skills/lookup-schema

microsoft/int-reference

data-ai

VerifiedTrustedCommunity

Reference tables for Copilot Studio YAML authoring: triggers, actions, variables, entities, Power Fx functions, templates. Preloaded by author and advisor agents.

378SKILL.mdUpdated Apr 30, 2026

microsoft/int-reference

microsoft/skills/add-knowledge

development

VerifiedTrustedCommunity

Add a knowledge source (public website or SharePoint) to a Copilot Studio agent. Use when the user asks to add a knowledge source, documentation URL, website, or SharePoint site for the agent to search.

378SKILL.mdUpdated Apr 30, 2026

microsoft/skills/add-knowledge

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/microsoft/skills-for-copilot-studio.git

# Copy into Claude Code skills folder (global)
cp -r skills-for-copilot-studio/skills/run-eval ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

microsoft/skills-for-copilot-studio

192 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT