Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

microsoft/skills/create-eval

Name: skills/create-eval
Author: microsoft

skills/create-eval/SKILL.md

npx skillsauth add microsoft/skills-for-copilot-studio skills/create-eval

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Create Scenario Eval

Guide the user through creating eval test cases for a Copilot Studio plugin scenario. Evals test end-to-end scenarios with natural prompts — the request routes through sub-agents (e.g., Author agent) which invoke skills internally.

How the eval system works

The eval harness (evals/evaluate.py) works by:

Copying a fixture agent into a temp workspace
Running claude -p "<prompt>" with a PreToolUse hook that traces skill invocations inside sub-agents
Checking routing (which agents and skills were invoked), output files, and response text against deterministic checks
Producing a JSON results file and HTML report

What can be tested right now

Authoring scenarios that produce YAML files (topics, agents, knowledge sources, etc.) are the best candidates. The harness supports these check types:

| Check | What it validates | Use for | |-------|------------------|---------| | agent_invoked | Expected sub-agent was dispatched (e.g., Author agent) | Routing verification | | agent_not_invoked | Unwanted sub-agents were NOT dispatched | Routing verification | | skill_invoked | Expected skill was invoked (traced inside sub-agents via hook) | Skill routing | | skill_not_invoked | Unwanted skills were NOT invoked | Skill routing | | files_created | Expected files were created/modified (glob pattern) | All authoring scenarios | | schema_validate | Full Copilot Studio schema validation (kind, required fields, IDs, Power Fx, scopes) | All YAML-producing scenarios | | yaml_structure | Specific YAML path has expected value, min array length, or contains string | Structural assertions | | content_contains | Keywords from prompt appear in output files | Domain relevance | | no_placeholders | No _REPLACE, TODO, or FIXME markers left | Template completion | | stdout_contains | CLI response text contains expected strings | Reference/info scenarios | | stdout_not_contains | CLI response does NOT contain error strings | Error absence | | exit_code | CLI exited with expected code | All scenarios | | yaml_unchanged | Specific file or YAML path was NOT modified | Preservation testing |

Note: no_placeholders runs automatically when any .mcs.yml file is changed, unless explicitly set to false.

Not yet testable: Integration scenarios that call external APIs (chat-directline, manage-agent) — these need script mocking which isn't implemented yet.

Available fixtures

Fixtures are pre-built agent directories in evals/fixtures/:

basic-agent — Minimal agent with GenerativeActionsEnabled: false, one Greeting topic. Use for most authoring evals.
agent-with-mcp-action — Same as basic-agent plus two MCP action files. Use for action-editing evals.
empty-workspace — No agent files. Use for negative-path testing.

If the scenario needs a richer agent (e.g., existing topics to modify, knowledge sources, actions), note that the fixture would need to be created first.

Instructions

Identify the target scenario. If $ARGUMENTS is provided, use it as the scenario name. Otherwise ask the user what scenario they want to test (e.g., "topic creation", "agent settings", "knowledge sources").
Read relevant skill SKILL.md files to understand what the scenario covers:
```
Glob: skills/*/SKILL.md
```
Understand: What skills are involved? What YAML kinds? What files get created/modified?
Check if evals already exist:
```
Glob: evals/scenarios/<scenario-name>.json
```
If yes, read them and offer to add more test cases. Note the highest existing eval ID.
Guide the user through creating test cases. For each eval, gather:
- name: Short descriptive title (e.g., "IT support topic with OnRecognizedIntent trigger")
- prompt: A natural language prompt — what a real user would say. Do NOT prefix with "Use the X skill to...".
- fixture: Which fixture agent to use (default: basic-agent)
- checks: What to validate about the routing and output

Help the user define checks. Based on the scenario type:

For topic-creation scenarios:

{
  "agent_invoked": "copilot-studio:Copilot Studio Author",
  "skill_invoked": "copilot-studio:new-topic",
  "files_created": [{"pattern": "topics/*.topic.mcs.yml", "min_count": 1}],
  "schema_validate": true,
  "yaml_structure": [
    {"path": "kind", "equals": "AdaptiveDialog"},
    {"path": "beginDialog.kind", "equals": "<trigger-type>"}
  ],
  "content_contains": ["<domain keywords>"],
  "no_placeholders": true
}

For agent-settings scenarios:

{
  "agent_invoked": "copilot-studio:Copilot Studio Author",
  "skill_invoked": "copilot-studio:edit-agent",
  "files_created": [{"pattern": "agent.mcs.yml", "min_count": 1}],
  "schema_validate": true,
  "yaml_structure": [
    {"path": "kind", "equals": "GptComponentMetadata"}
  ],
  "content_contains": ["<expected content>"],
  "no_placeholders": true
}

For knowledge-source scenarios:

{
  "agent_invoked": "copilot-studio:Copilot Studio Author",
  "skill_invoked": "copilot-studio:add-knowledge",
  "files_created": [{"pattern": "knowledge/*.knowledge.mcs.yml", "min_count": 1}],
  "schema_validate": true,
  "no_placeholders": true
}

For reference/query scenarios:

{
  "stdout_contains": ["<expected content in response>"],
  "exit_code": 0
}

Recommend at least 3 test cases that cover different possibilities within the scenario. For example, for topic-creation:
- Different trigger types (OnRecognizedIntent, OnConversationStart, OnUnknownIntent)
- Different complexity levels (simple message, multi-step with questions, branching)
- Edge cases (empty workspace refusal)

Write the scenario JSON file:

Write: evals/scenarios/<scenario-name>.json

Format:

{
  "scenario_name": "<scenario-name>",
  "evals": [
    {
      "id": 1,
      "name": "<short descriptive title>",
      "prompt": "<natural language request — what a user would say>",
      "fixture": "basic-agent",
      "mock_scripts": [],
      "checks": { ... }
    }
  ]
}

Tell the user how to run the evals:
```
python3 evals/evaluate.py --scenario <scenario-name> --verbose
```
Or for all scenarios: node evals/run.js

To generate the HTML report: python3 evals/report.py evals/results/<timestamp>/

Important guidelines

Prompts must be natural language — write what a real user would say, not "Use the X skill to..."
Include agent_invoked and skill_invoked checks to verify correct routing
Keep prompts specific enough that checks can be deterministic (mention exact names, values, counts)
Use schema_validate: true for ALL scenarios that produce YAML — it's the most powerful check
content_contains keywords should come directly from the prompt to verify domain relevance
Don't create evals for deprecated skills (chat-with-agent, directline-chat)
Eval IDs must be unique integers within a scenario's JSON

microsoft/skills/create-eval

skills/create-eval/SKILL.md

Create plugin development eval scenarios (JSON files with natural prompts and deterministic checks for testing plugin skills). NOT for Copilot Studio in-product evaluation — use /copilot-studio:create-eval-set for that.

192 stars

tools

Updated Apr 30, 2026

$ install --global

skillsauth

npx skillsauth add microsoft/skills-for-copilot-studio skills/create-eval

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 30, 2026, 7:04 AM13.8s1 file scanned

SKILL.md

user-invocable:: true
description:: Create plugin development eval scenarios (JSON files with natural prompts and deterministic checks for testing plugin skills). NOT for Copilot Studio in-product evaluation — use /copilot-studio:create-eval-set for that.
argument-hint:: <scenario name>
allowed-tools:: Read, Write, Glob
context:: fork

Create Scenario Eval

How the eval system works

The eval harness (evals/evaluate.py) works by:

Copying a fixture agent into a temp workspace
Running claude -p "<prompt>" with a PreToolUse hook that traces skill invocations inside sub-agents
Checking routing (which agents and skills were invoked), output files, and response text against deterministic checks
Producing a JSON results file and HTML report

What can be tested right now

Authoring scenarios that produce YAML files (topics, agents, knowledge sources, etc.) are the best candidates. The harness supports these check types:

Note: no_placeholders runs automatically when any .mcs.yml file is changed, unless explicitly set to false.

Not yet testable: Integration scenarios that call external APIs (chat-directline, manage-agent) — these need script mocking which isn't implemented yet.

Available fixtures

Fixtures are pre-built agent directories in evals/fixtures/:

basic-agent — Minimal agent with GenerativeActionsEnabled: false, one Greeting topic. Use for most authoring evals.
agent-with-mcp-action — Same as basic-agent plus two MCP action files. Use for action-editing evals.
empty-workspace — No agent files. Use for negative-path testing.

If the scenario needs a richer agent (e.g., existing topics to modify, knowledge sources, actions), note that the fixture would need to be created first.

Instructions

Identify the target scenario. If $ARGUMENTS is provided, use it as the scenario name. Otherwise ask the user what scenario they want to test (e.g., "topic creation", "agent settings", "knowledge sources").
Read relevant skill SKILL.md files to understand what the scenario covers:
```
Glob: skills/*/SKILL.md
```
Understand: What skills are involved? What YAML kinds? What files get created/modified?
Check if evals already exist:
```
Glob: evals/scenarios/<scenario-name>.json
```
If yes, read them and offer to add more test cases. Note the highest existing eval ID.
Guide the user through creating test cases. For each eval, gather:
- name: Short descriptive title (e.g., "IT support topic with OnRecognizedIntent trigger")
- prompt: A natural language prompt — what a real user would say. Do NOT prefix with "Use the X skill to...".
- fixture: Which fixture agent to use (default: basic-agent)
- checks: What to validate about the routing and output

Help the user define checks. Based on the scenario type:

For topic-creation scenarios:

{
  "agent_invoked": "copilot-studio:Copilot Studio Author",
  "skill_invoked": "copilot-studio:new-topic",
  "files_created": [{"pattern": "topics/*.topic.mcs.yml", "min_count": 1}],
  "schema_validate": true,
  "yaml_structure": [
    {"path": "kind", "equals": "AdaptiveDialog"},
    {"path": "beginDialog.kind", "equals": "<trigger-type>"}
  ],
  "content_contains": ["<domain keywords>"],
  "no_placeholders": true
}

For agent-settings scenarios:

{
  "agent_invoked": "copilot-studio:Copilot Studio Author",
  "skill_invoked": "copilot-studio:edit-agent",
  "files_created": [{"pattern": "agent.mcs.yml", "min_count": 1}],
  "schema_validate": true,
  "yaml_structure": [
    {"path": "kind", "equals": "GptComponentMetadata"}
  ],
  "content_contains": ["<expected content>"],
  "no_placeholders": true
}

For knowledge-source scenarios:

{
  "agent_invoked": "copilot-studio:Copilot Studio Author",
  "skill_invoked": "copilot-studio:add-knowledge",
  "files_created": [{"pattern": "knowledge/*.knowledge.mcs.yml", "min_count": 1}],
  "schema_validate": true,
  "no_placeholders": true
}

For reference/query scenarios:

{
  "stdout_contains": ["<expected content in response>"],
  "exit_code": 0
}

Recommend at least 3 test cases that cover different possibilities within the scenario. For example, for topic-creation:
- Different trigger types (OnRecognizedIntent, OnConversationStart, OnUnknownIntent)
- Different complexity levels (simple message, multi-step with questions, branching)
- Edge cases (empty workspace refusal)

Write the scenario JSON file:

Write: evals/scenarios/<scenario-name>.json

Format:

{
  "scenario_name": "<scenario-name>",
  "evals": [
    {
      "id": 1,
      "name": "<short descriptive title>",
      "prompt": "<natural language request — what a user would say>",
      "fixture": "basic-agent",
      "mock_scripts": [],
      "checks": { ... }
    }
  ]
}

Tell the user how to run the evals:
```
python3 evals/evaluate.py --scenario <scenario-name> --verbose
```
Or for all scenarios: node evals/run.js

To generate the HTML report: python3 evals/report.py evals/results/<timestamp>/

Important guidelines

Prompts must be natural language — write what a real user would say, not "Use the X skill to..."
Include agent_invoked and skill_invoked checks to verify correct routing
Keep prompts specific enough that checks can be deterministic (mention exact names, values, counts)
Use schema_validate: true for ALL scenarios that produce YAML — it's the most powerful check
content_contains keywords should come directly from the prompt to verify domain relevance
Don't create evals for deprecated skills (chat-with-agent, directline-chat)
Eval IDs must be unique integers within a scenario's JSON

Related Skills

microsoft/int-patterns

development

VerifiedTrustedCommunity

Pattern library for Copilot Studio agent design. Contains proven and recommended implementation patterns with YAML examples. Used by the Advisor agent to suggest patterns and by the Author agent to reference YAML structure during implementation. USE FOR: design guidance, pattern suggestions, review against best practices, troubleshooting known pitfalls, YAML reference during implementation. DO NOT USE FOR: general YAML schema reference (use int-reference), topic creation mechanics (use new-topic).

378SKILL.mdUpdated Jul 26, 2026

microsoft/int-patterns

microsoft/skills/lookup-schema

testing

VerifiedTrustedCommunity

Look up Copilot Studio YAML schema definitions. Use when the user asks about schema structure, element properties, or how to use a specific YAML kind.

378SKILL.mdUpdated Apr 30, 2026

microsoft/skills/lookup-schema

microsoft/int-reference

data-ai

VerifiedTrustedCommunity

Reference tables for Copilot Studio YAML authoring: triggers, actions, variables, entities, Power Fx functions, templates. Preloaded by author and advisor agents.

378SKILL.mdUpdated Apr 30, 2026

microsoft/int-reference

microsoft/skills/add-knowledge

development

VerifiedTrustedCommunity

Add a knowledge source (public website or SharePoint) to a Copilot Studio agent. Use when the user asks to add a knowledge source, documentation URL, website, or SharePoint site for the agent to search.

378SKILL.mdUpdated Apr 30, 2026

microsoft/skills/add-knowledge

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/microsoft/skills-for-copilot-studio.git

# Copy into Claude Code skills folder (global)
cp -r skills-for-copilot-studio/skills/create-eval ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

microsoft/skills-for-copilot-studio

192 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT