Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

awslabs/dataset-evaluation

Name: dataset-evaluation
Author: awslabs

plugins/sagemaker-ai/skills/dataset-evaluation/SKILL.md

npx skillsauth add awslabs/agent-plugins dataset-evaluation

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Workflow Instruction

Follow the workflow shown below. Locate the dataset, check the file type, and resolve any issues with missing files or wrong file types. Determine the fine-tuning model and fine-tuning strategy. Run the appropriate validation based on the model family. Summarize the results: is the dataset ready for fine-tuning?

Prerequisites

The SDK environment has been verified (SDK version, region, execution role). If not done, activate the sdk-getting-started skill first.

Workflow

Locate Dataset:
- The full path may be a local file path, or an S3 URI
- Resolve the full path to the dataset file, make sure read permissions are available, and help the user if the file is not found
Determine strategy and model:
- File formatting depends on the currently selected fine-tuning strategy and fine-tuning base model.
- If the strategy and model are already known from the conversation context (e.g., selected via the model-selection and finetuning-technique skills), use them.
- If not available in context, activate the model-selection and/or finetuning-technique skills to determine them before proceeding.
- Exception: If the user is validating an evaluation dataset (not a training dataset), neither model nor technique is required — the format detector can validate eval format (query/response structure) independently. Do not block on model-selection or finetuning-technique for eval dataset validation.
Check File Formatting: Run the tool format_detector.py to make sure the file conforms to formatting requirements.
- Send the full path directly to the format_detector script as an argument
- Do not send the model and strategy as arguments
- Do not download data from S3
- Do not make local copies of data
Summarize Results: Tell the user if their data is ready
- Examine the output of format_detector and compare to the known strategy and model
- Important: training datasets and evaluation datasets have different format requirements.
  - Training datasets must match the fine-tuning strategy format per references/strategy_data_requirements.md
  - Evaluation datasets (for model evaluation) must match one of the SageMaker evaluation dataset formats.
  - Custom Scorer evaluation datasets have scorer-specific requirements. If the dataset is intended for Custom Scorer evaluation (Prime Math, Prime Code, or Custom Lambda), read references/custom-scorer-evaluation-dataset-formats.md and validate against the scorer-specific schema. The scorer type should be known from conversation context (determined in the model-evaluation skill).
- Report back to the user if their current dataset is valid for its intended purpose
- Warn the user if their dataset is valid, but for a different strategy or model
- Warn the user if their dataset is not valid for any strategy/model pair
- If the user plans to finetune a model with the evaluated dataset, it needs to be uploaded to an S3 bucket in the same region as the planned training job (usually the default region). Warn the user if this is NOT the case.
- If the dataset is NOT in the necessary format, recommend transforming it using the dataset-transformation skill, wait for user confirmation, and update the plan based on their response

Messages to the User

Introduction: "This skill checks the structure of your dataset for model fine-tuning."
File types: This skill applies to files that are formatted according to the Amazon SageMaker AI Developer Guide

Resources

scripts/format_detector.py is self-contained format validation script that can be run independently
model-selection and finetuning-technique skills should have already determined the base model and fine-tuning strategy
references/strategy_data_requirements.md contains data format requirements per strategy

Script Details

scripts/format_detector.py is self-contained format validation script that can be run independently:

# With the file path argument identified in workflow step 1
python scripts/format_detector.py local_path/to/dataset

References

scripts/format_detector.py — Self-contained format validation script
references/strategy_data_requirements.md — Data format requirements per strategy

awslabs/dataset-evaluation

plugins/sagemaker-ai/skills/dataset-evaluation/SKILL.md

Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.

780 stars

development

Updated Jun 11, 2026

$ install --global

skillsauth

npx skillsauth add awslabs/agent-plugins dataset-evaluation

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 11, 2026, 4:06 AM163.9s4 files scanned

SKILL.md

name:: dataset-evaluation
description:: Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.
version:: 1.0.0

Workflow Instruction

Prerequisites

The SDK environment has been verified (SDK version, region, execution role). If not done, activate the sdk-getting-started skill first.

Workflow

Locate Dataset:
- The full path may be a local file path, or an S3 URI
- Resolve the full path to the dataset file, make sure read permissions are available, and help the user if the file is not found
Determine strategy and model:
- File formatting depends on the currently selected fine-tuning strategy and fine-tuning base model.
- If the strategy and model are already known from the conversation context (e.g., selected via the model-selection and finetuning-technique skills), use them.
- If not available in context, activate the model-selection and/or finetuning-technique skills to determine them before proceeding.
- Exception: If the user is validating an evaluation dataset (not a training dataset), neither model nor technique is required — the format detector can validate eval format (query/response structure) independently. Do not block on model-selection or finetuning-technique for eval dataset validation.
Check File Formatting: Run the tool format_detector.py to make sure the file conforms to formatting requirements.
- Send the full path directly to the format_detector script as an argument
- Do not send the model and strategy as arguments
- Do not download data from S3
- Do not make local copies of data
Summarize Results: Tell the user if their data is ready
- Examine the output of format_detector and compare to the known strategy and model
- Important: training datasets and evaluation datasets have different format requirements.
  - Training datasets must match the fine-tuning strategy format per references/strategy_data_requirements.md
  - Evaluation datasets (for model evaluation) must match one of the SageMaker evaluation dataset formats.
  - Custom Scorer evaluation datasets have scorer-specific requirements. If the dataset is intended for Custom Scorer evaluation (Prime Math, Prime Code, or Custom Lambda), read references/custom-scorer-evaluation-dataset-formats.md and validate against the scorer-specific schema. The scorer type should be known from conversation context (determined in the model-evaluation skill).
- Report back to the user if their current dataset is valid for its intended purpose
- Warn the user if their dataset is valid, but for a different strategy or model
- Warn the user if their dataset is not valid for any strategy/model pair
- If the user plans to finetune a model with the evaluated dataset, it needs to be uploaded to an S3 bucket in the same region as the planned training job (usually the default region). Warn the user if this is NOT the case.
- If the dataset is NOT in the necessary format, recommend transforming it using the dataset-transformation skill, wait for user confirmation, and update the plan based on their response

Messages to the User

Introduction: "This skill checks the structure of your dataset for model fine-tuning."
File types: This skill applies to files that are formatted according to the Amazon SageMaker AI Developer Guide

Resources

scripts/format_detector.py is self-contained format validation script that can be run independently
model-selection and finetuning-technique skills should have already determined the base model and fine-tuning strategy
references/strategy_data_requirements.md contains data format requirements per strategy

Script Details

scripts/format_detector.py is self-contained format validation script that can be run independently:

# With the file path argument identified in workflow step 1
python scripts/format_detector.py local_path/to/dataset

References

scripts/format_detector.py — Self-contained format validation script
references/strategy_data_requirements.md — Data format requirements per strategy

Related Skills

awslabs/aws-step-functions

development

VerifiedTrustedCommunity

Build workflows with AWS Step Functions state machines using the JSONata query language. Covers Amazon States Language (ASL) structure, state types, variables, data transformation, error handling, AWS service integration, and migrating from the JSONPath to the JSONata query language.

785SKILL.mdUpdated Jun 13, 2026

awslabs/aws-step-functions

awslabs/aws-lambda

tools

VerifiedTrustedCommunity

Design, build, deploy, test, and debug serverless applications with AWS Lambda. Triggers on phrases like: Lambda function, event source, serverless application, API Gateway, EventBridge, Step Functions, serverless API, event-driven architecture, Lambda trigger. For deploying non-serverless apps to AWS, use deploy-on-aws plugin instead.

785SKILL.mdUpdated Apr 3, 2026

awslabs/sdk-getting-started

development

VerifiedTrustedCommunity

Validates the user's environment for SageMaker AI operations — checks SDK version, AWS region, and execution role. Use when the user says "set up", "getting started", "check my environment", "configure SDK", or as the first step in any plan involving SageMaker/Bedrock training, evaluation, or deployment.

780SKILL.mdUpdated Jun 11, 2026

awslabs/sdk-getting-started

awslabs/model-selection

data-ai

VerifiedTrustedCommunity

Selects a base model for the user's use case by querying SageMaker Hub. Use when the user asks which model to use, wants to select or change their base model, mentions a model name or family (e.g., "Llama", "Mistral", "Nova"), or wants to evaluate a base model — always activate even for known model names because the exact Hub model ID must be resolved. Queries available models, presents benchmarks and licenses, and confirms selection.

780SKILL.mdUpdated Jun 11, 2026

awslabs/model-selection

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/awslabs/agent-plugins.git

# Copy into Claude Code skills folder (global)
cp -r agent-plugins/plugins/sagemaker-ai/skills/dataset-evaluation ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

awslabs/agent-plugins

780 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT