Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

microsoft/skills/analyze-evals

Name: skills/analyze-evals
Author: microsoft

skills/analyze-evals/SKILL.md

npx skillsauth add microsoft/skills-for-copilot-studio skills/analyze-evals

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Analyze Copilot Studio Evaluation Results

Analyze evaluation results exported from the Copilot Studio UI as CSV.

Phase 1: Get Results

Ask the user for the CSV file path if not already provided. The file is typically exported from Copilot Studio's Evaluate tab and named Evaluate <agent name> <date>.csv in their Downloads folder.
Read the CSV file. The in-product evaluation CSV has these columns:

| Column | Meaning | |--------|---------| | question | The test utterance | | expectedResponse | Expected response (may be empty) | | actualResponse | What the agent responded | | testMethodType_1 | Eval method (e.g., GeneralQuality) | | result_1 | Pass or Fail | | passingScore_1 | Score threshold (may be empty) | | explanation_1 | Why it passed/failed (e.g., "Seems relevant; Seems incomplete; Knowledge sources not cited") |

The _1 suffix indicates the first eval method. There may be additional methods (_2, _3, etc.) with the same column pattern.

Phase 2: Analyze Results

Focus on failed evaluations (result_1 = Fail, or any result_N = Fail).
For each failure, use the explanation column to understand the issue:
- "Question not answered" — The agent couldn't handle the question. Check if there's a matching topic or knowledge source.
- "Knowledge sources not cited" — The agent responded but didn't cite sources. Check knowledge source configuration and SearchAndSummarizeContent nodes.
- "Seems incomplete" — The response was partial. Check topic flow for early exits, missing branches, or incomplete SendActivity messages.
- Error messages in actualResponse (e.g., GenAIToolPlannerRateLimitReached) — These are runtime errors, not authoring issues. Flag them to the user as transient failures to retry.

Phase 3: Propose Fixes

For each failure, identify the relevant YAML file(s):
- Auto-discover the agent: Glob: **/agent.mcs.yml
- Find the relevant topic by matching the test utterance against trigger phrases and model descriptions
- Read the topic file to understand the current flow
Propose specific YAML changes to fix each failure. Present them to the user as a summary:
- Which test(s) failed and why
- Which file(s) need changes
- What the proposed change is (show the diff)
Wait for user decision. The user can:
- Accept all — apply all proposed changes
- Accept partially — apply only some changes (ask which ones)
- Reject — discard proposed changes and discuss alternative approaches
Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running evaluations.

microsoft/skills/analyze-evals

skills/analyze-evals/SKILL.md

Analyze exported evaluation results from Copilot Studio's Evaluate tab. The user provides a CSV file exported from the Copilot Studio UI; this skill parses it, identifies failures, and proposes YAML fixes. No API access or published agent required — just the exported CSV.

192 stars

development

Updated Apr 30, 2026

$ install --global

skillsauth

npx skillsauth add microsoft/skills-for-copilot-studio skills/analyze-evals

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 30, 2026, 7:04 AM9.3s1 file scanned

SKILL.md

user-invocable:: false
description:: >
allowed-tools:: Read, Glob, Grep, Edit
context:: fork
agent:: copilot-studio-test

Analyze Copilot Studio Evaluation Results

Analyze evaluation results exported from the Copilot Studio UI as CSV.

Phase 1: Get Results

Ask the user for the CSV file path if not already provided. The file is typically exported from Copilot Studio's Evaluate tab and named Evaluate <agent name> <date>.csv in their Downloads folder.
Read the CSV file. The in-product evaluation CSV has these columns:

| Column | Meaning | |--------|---------| | question | The test utterance | | expectedResponse | Expected response (may be empty) | | actualResponse | What the agent responded | | testMethodType_1 | Eval method (e.g., GeneralQuality) | | result_1 | Pass or Fail | | passingScore_1 | Score threshold (may be empty) | | explanation_1 | Why it passed/failed (e.g., "Seems relevant; Seems incomplete; Knowledge sources not cited") |

The _1 suffix indicates the first eval method. There may be additional methods (_2, _3, etc.) with the same column pattern.

Phase 2: Analyze Results

Focus on failed evaluations (result_1 = Fail, or any result_N = Fail).
For each failure, use the explanation column to understand the issue:
- "Question not answered" — The agent couldn't handle the question. Check if there's a matching topic or knowledge source.
- "Knowledge sources not cited" — The agent responded but didn't cite sources. Check knowledge source configuration and SearchAndSummarizeContent nodes.
- "Seems incomplete" — The response was partial. Check topic flow for early exits, missing branches, or incomplete SendActivity messages.
- Error messages in actualResponse (e.g., GenAIToolPlannerRateLimitReached) — These are runtime errors, not authoring issues. Flag them to the user as transient failures to retry.

Phase 3: Propose Fixes

For each failure, identify the relevant YAML file(s):
- Auto-discover the agent: Glob: **/agent.mcs.yml
- Find the relevant topic by matching the test utterance against trigger phrases and model descriptions
- Read the topic file to understand the current flow
Propose specific YAML changes to fix each failure. Present them to the user as a summary:
- Which test(s) failed and why
- Which file(s) need changes
- What the proposed change is (show the diff)
Wait for user decision. The user can:
- Accept all — apply all proposed changes
- Accept partially — apply only some changes (ask which ones)
- Reject — discard proposed changes and discuss alternative approaches
Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running evaluations.

Related Skills

microsoft/int-patterns

development

VerifiedTrustedCommunity

Pattern library for Copilot Studio agent design. Contains proven and recommended implementation patterns with YAML examples. Used by the Advisor agent to suggest patterns and by the Author agent to reference YAML structure during implementation. USE FOR: design guidance, pattern suggestions, review against best practices, troubleshooting known pitfalls, YAML reference during implementation. DO NOT USE FOR: general YAML schema reference (use int-reference), topic creation mechanics (use new-topic).

378SKILL.mdUpdated Jul 26, 2026

microsoft/int-patterns

microsoft/skills/lookup-schema

testing

VerifiedTrustedCommunity

Look up Copilot Studio YAML schema definitions. Use when the user asks about schema structure, element properties, or how to use a specific YAML kind.

378SKILL.mdUpdated Apr 30, 2026

microsoft/skills/lookup-schema

microsoft/int-reference

data-ai

VerifiedTrustedCommunity

Reference tables for Copilot Studio YAML authoring: triggers, actions, variables, entities, Power Fx functions, templates. Preloaded by author and advisor agents.

378SKILL.mdUpdated Apr 30, 2026

microsoft/int-reference

microsoft/skills/add-knowledge

development

VerifiedTrustedCommunity

Add a knowledge source (public website or SharePoint) to a Copilot Studio agent. Use when the user asks to add a knowledge source, documentation URL, website, or SharePoint site for the agent to search.

378SKILL.mdUpdated Apr 30, 2026

microsoft/skills/add-knowledge

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/microsoft/skills-for-copilot-studio.git

# Copy into Claude Code skills folder (global)
cp -r skills-for-copilot-studio/skills/analyze-evals ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

microsoft/skills-for-copilot-studio

192 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT