Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

microsoft/skills/create-eval-set

Name: skills/create-eval-set
Author: microsoft

skills/create-eval-set/SKILL.md

npx skillsauth add microsoft/skills-for-copilot-studio skills/create-eval-set

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Create Evaluation Test Set

Create a test set CSV file that can be imported into Copilot Studio's Evaluate tab for in-product agent evaluation.

Phase 1: Understand the Agent

Read the agent's YAML files to understand what it does:

Glob: **/agent.mcs.yml — find the agent
Read agent.mcs.yml — get the agent's instructions, description, and capabilities
Read settings.mcs.yml — check orchestration mode (generative vs classic)
Glob: **/topics/*.mcs.yml — list all topics
Read key topics (especially non-system ones) — understand trigger phrases, conversation flows, expected behaviors
Check for knowledge sources, actions, and connected tools

Phase 2: Design Test Cases

Create test cases that cover:

| Category | What to test | Example | |----------|-------------|---------| | Core functionality | Main topics and capabilities | Questions matching trigger phrases | | Knowledge/generative | Knowledge source responses | Questions the agent should answer from its knowledge | | System topics | Greeting, Escalation, Goodbye, Thank You, Fallback | "Hi", "I want to speak to a person", "Goodbye" | | Edge cases | Out-of-scope, ambiguous, off-topic | "Tell me a joke", "Book a flight for me" | | Boundary testing | Things the agent should NOT do | Actions beyond its capabilities |

Aim for 10–25 test cases with good coverage across categories.

Phase 3: Write the Expected Responses

The CSV import only supports two columns: question and expectedResponse. Test methods cannot be set via CSV import — they are configured in the UI after import. The default test method (General quality) is applied to all imported test cases.

Write expected responses with this in mind:

For questions where you want General quality grading: write behavioral descriptions ("The response should recommend hotels in Paris with relevant details")
For questions where you'll later switch to Compare meaning or Exact match in the UI: write realistic agent replies that the grader can compare against
Leave expectedResponse empty for questions that only need General quality (it works without expected responses)

Available test methods (configured in UI after import)

| Test method | What it measures | Requires expected response? | |-------------|-----------------|---------------------------| | General quality (default) | AI-graded quality: relevance, completeness, groundedness, abstention | No (but recommended as a rubric) | | Compare meaning | Semantic similarity — compares meaning/intent | Yes | | Text similarity | Cosine similarity of text | Yes, configurable pass threshold | | Exact match | Character-for-character match | Yes | | Keyword match | Response contains expected keywords/phrases | Yes (keywords added in UI) | | Capability use | Agent called expected tools/topics | Configured in UI | | Custom | Custom grader with your own instructions and labels | Configured in UI |

Phase 4: Write the CSV

Write the CSV file using the Write tool. The format must be:

"question","expectedResponse"
"User question here","Expected agent response or behavioral rubric"
"Question without expected response",

Column specification

| Column | Required | Description | |--------|----------|-------------| | question | Yes | The user message to send to the agent. Max 1,000 characters. | | expectedResponse | No | The expected response or behavioral rubric. Leave empty if not needed. |

Important: The Testing method column is not supported on import — it is ignored. All imported test cases get the default test method (General quality). Configure other test methods in the UI after import.

Rules

Max 100 questions per test set
Max 1,000 characters per question (including spaces)
File must be .csv format
Use double quotes around all values
For questions that will use General quality: write expected responses as behavioral descriptions
For questions that will use Compare meaning or Exact match: write expected responses as realistic agent replies

Expected response examples

Behavioral rubric (for General quality):

"Find me a hotel in Paris","The response should include hotel recommendations in Paris with relevant details like names, locations, or prices."

Realistic reply (for Compare meaning — set method in UI after import):

"Hi there","Hello! How can I help you today?"

Exact expected text (for Exact match — set method in UI after import):

"What is 2+2?","4"

Phase 5: Instruct the User

After writing the CSV, tell the user:

To import into Copilot Studio:

Open your agent in Copilot Studio

Go to the Evaluate tab

Click New evaluation > Single response

Drag or browse for the CSV file

Review the imported test cases and adjust if needed

Optionally add more test methods (Capability use, Custom) in the UI

Click Evaluate to run, or Save to run later

After import, some things can only be configured in the UI:

Pass thresholds for Compare meaning and Text similarity
Keywords for Keyword match test cases
Expected capabilities for Capability use test cases
Custom grader instructions and labels

microsoft/skills/create-eval-set

skills/create-eval-set/SKILL.md

Create a test set CSV file for import into Copilot Studio's in-product Evaluate tab. Reads the agent's topics, instructions, and knowledge sources to generate meaningful test cases with appropriate graders (General quality, Compare meaning, Exact match, etc.). Use when the user asks to create, prepare, or generate evaluation test cases for their agent.

192 stars

testing

Updated Apr 30, 2026

$ install --global

skillsauth

npx skillsauth add microsoft/skills-for-copilot-studio skills/create-eval-set

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 30, 2026, 7:04 AM12.9s1 file scanned

SKILL.md

user-invocable:: false
description:: >
allowed-tools:: Read, Glob, Grep, Write
context:: fork
agent:: copilot-studio-test

Create Evaluation Test Set

Create a test set CSV file that can be imported into Copilot Studio's Evaluate tab for in-product agent evaluation.

Phase 1: Understand the Agent

Read the agent's YAML files to understand what it does:

Glob: **/agent.mcs.yml — find the agent
Read agent.mcs.yml — get the agent's instructions, description, and capabilities
Read settings.mcs.yml — check orchestration mode (generative vs classic)
Glob: **/topics/*.mcs.yml — list all topics
Read key topics (especially non-system ones) — understand trigger phrases, conversation flows, expected behaviors
Check for knowledge sources, actions, and connected tools

Phase 2: Design Test Cases

Create test cases that cover:

Aim for 10–25 test cases with good coverage across categories.

Phase 3: Write the Expected Responses

Write expected responses with this in mind:

For questions where you want General quality grading: write behavioral descriptions ("The response should recommend hotels in Paris with relevant details")
For questions where you'll later switch to Compare meaning or Exact match in the UI: write realistic agent replies that the grader can compare against
Leave expectedResponse empty for questions that only need General quality (it works without expected responses)

Available test methods (configured in UI after import)

Phase 4: Write the CSV

Write the CSV file using the Write tool. The format must be:

"question","expectedResponse"
"User question here","Expected agent response or behavioral rubric"
"Question without expected response",

Column specification

Rules

Max 100 questions per test set
Max 1,000 characters per question (including spaces)
File must be .csv format
Use double quotes around all values
For questions that will use General quality: write expected responses as behavioral descriptions
For questions that will use Compare meaning or Exact match: write expected responses as realistic agent replies

Expected response examples

Behavioral rubric (for General quality):

"Find me a hotel in Paris","The response should include hotel recommendations in Paris with relevant details like names, locations, or prices."

Realistic reply (for Compare meaning — set method in UI after import):

"Hi there","Hello! How can I help you today?"

Exact expected text (for Exact match — set method in UI after import):

"What is 2+2?","4"

Phase 5: Instruct the User

After writing the CSV, tell the user:

To import into Copilot Studio:

Open your agent in Copilot Studio

Go to the Evaluate tab

Click New evaluation > Single response

Drag or browse for the CSV file

Review the imported test cases and adjust if needed

Optionally add more test methods (Capability use, Custom) in the UI

Click Evaluate to run, or Save to run later

After import, some things can only be configured in the UI:

Pass thresholds for Compare meaning and Text similarity
Keywords for Keyword match test cases
Expected capabilities for Capability use test cases
Custom grader instructions and labels

Related Skills

microsoft/int-patterns

development

VerifiedTrustedCommunity

Pattern library for Copilot Studio agent design. Contains proven and recommended implementation patterns with YAML examples. Used by the Advisor agent to suggest patterns and by the Author agent to reference YAML structure during implementation. USE FOR: design guidance, pattern suggestions, review against best practices, troubleshooting known pitfalls, YAML reference during implementation. DO NOT USE FOR: general YAML schema reference (use int-reference), topic creation mechanics (use new-topic).

378SKILL.mdUpdated Jul 26, 2026

microsoft/int-patterns

microsoft/skills/lookup-schema

testing

VerifiedTrustedCommunity

Look up Copilot Studio YAML schema definitions. Use when the user asks about schema structure, element properties, or how to use a specific YAML kind.

378SKILL.mdUpdated Apr 30, 2026

microsoft/skills/lookup-schema

microsoft/int-reference

data-ai

VerifiedTrustedCommunity

Reference tables for Copilot Studio YAML authoring: triggers, actions, variables, entities, Power Fx functions, templates. Preloaded by author and advisor agents.

378SKILL.mdUpdated Apr 30, 2026

microsoft/int-reference

microsoft/skills/add-knowledge

development

VerifiedTrustedCommunity

Add a knowledge source (public website or SharePoint) to a Copilot Studio agent. Use when the user asks to add a knowledge source, documentation URL, website, or SharePoint site for the agent to search.

378SKILL.mdUpdated Apr 30, 2026

microsoft/skills/add-knowledge

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/microsoft/skills-for-copilot-studio.git

# Copy into Claude Code skills folder (global)
cp -r skills-for-copilot-studio/skills/create-eval-set ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

microsoft/skills-for-copilot-studio

192 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT