Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

langwatch/level-up

Name: level-up
Author: langwatch

skills/level-up/SKILL.md

npx skillsauth add langwatch/langwatch level-up

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Take Your Agent to the Next Level

This skill sets up your agent with the full LangWatch stack: tracing, prompt versioning, evaluation experiments, and agent simulation tests. Each step builds on the previous one.

Plan Limits

See Plan Limits. The free plan has limits on prompts, scenarios, evaluators, and experiments. Focus on delivering value at each step — make each creation count. Show the user what works before they hit any limits. If you reach a limit, summarize what was accomplished and suggest upgrading at https://app.langwatch.ai/settings/subscription

Prerequisites

Set up the LangWatch CLI first — you'll use it throughout. See CLI Setup.

For documentation access, also set up the MCP: See MCP Setup for installation instructions. If MCP installation fails, see docs fallback to fetch docs directly.

Consultant Mode

After completing all steps, don't just stop. See Consultant Mode — summarize everything you set up, then suggest 2-3 ways to go deeper based on what you learned about the codebase.

Step 1: Add Tracing

Add LangWatch tracing to capture all LLM calls, costs, and latency.

Use the LangWatch MCP to fetch the integration guide for this project's framework: call fetch_langwatch_docs with no args to see the index, then read the specific framework page
Install the LangWatch SDK (pip install langwatch or npm install langwatch)
Add instrumentation following the framework-specific guide
Add LANGWATCH_API_KEY to .env

Verify: Run the application briefly and confirm traces appear at https://app.langwatch.ai

Step 2: Version Your Prompts

Move hardcoded prompts to LangWatch Prompt CLI for version control and collaboration.

Use the LangWatch MCP to read the Prompts CLI docs: call fetch_langwatch_docs with url https://langwatch.ai/docs/prompt-management/cli.md
Install the CLI: npm install -g langwatch then langwatch login
Initialize: langwatch prompt init
Create prompts: langwatch prompt create <name> for each prompt in the code
Update application code to use langwatch.prompts.get("name") instead of hardcoded strings
Sync: langwatch prompt sync

Verify: Check that prompts appear at https://app.langwatch.ai in the Prompts section.

Do NOT hardcode prompts in code. Do NOT add try/catch fallbacks around prompts.get().

Step 3: Create an Evaluation Experiment

Build a batch evaluation to measure your agent's quality across many examples.

Use the LangWatch MCP to read the experiments SDK docs: call fetch_langwatch_docs with url https://langwatch.ai/docs/evaluations/experiments/sdk.md
Analyze the agent's code to understand what it does
Generate a dataset of 10-20 examples tailored to the agent's domain (NOT generic examples)
Create an experiment file:
- Python: Jupyter notebook with langwatch.experiment.init(), evaluation loop, and evaluators
- TypeScript: Script with langwatch.experiments.init() and evaluation.run()
Include at least one evaluator (LLM-as-judge for quality is a good default)

Verify: Run the experiment and check results appear in the LangWatch Experiments view.

Step 4: Add Agent Simulation Tests

Create scenario tests to validate agent behavior in realistic multi-turn conversations.

Use the LangWatch MCP to read Scenario docs: call fetch_scenario_docs with no args for the index
Install the Scenario SDK (pip install langwatch-scenario or npm install @langwatch/scenario)
Write scenario tests with AgentAdapter, UserSimulatorAgent, and JudgeAgent
Use semantic criteria in JudgeAgent (NOT regex matching)

Verify: Run the tests and confirm they pass.

NEVER invent your own testing framework. Use @langwatch/scenario / langwatch-scenario.

Common Mistakes

Do NOT skip any step -- each builds on the previous
Do NOT use generic datasets in the experiment -- tailor them to the agent's domain
Do NOT hardcode prompts -- use the Prompts CLI
Do NOT invent testing frameworks -- use Scenario
Do NOT use platform_ MCP tools -- this skill writes code in the project
Do NOT skip verification steps -- run the application/experiment/tests after each step

langwatch/level-up

skills/level-up/SKILL.md

Take your AI agent to the next level with full LangWatch integration. Adds tracing, prompt versioning, evaluation experiments, and simulation tests in one go. Use when the user wants comprehensive observability, testing, and prompt management for their agent.

3,203 stars

development

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add langwatch/langwatch level-up

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:47 PM1.7s1 file scanned

SKILL.md

name:: level-up
user-prompt:: Take my agent to the next level
description:: Take your AI agent to the next level with full LangWatch integration. Adds tracing, prompt versioning, evaluation experiments, and simulation tests in one go. Use when the user wants comprehensive observability, testing, and prompt management for their agent.
license:: MIT
compatibility:: Works with Claude Code and similar coding agents. CLI is the preferred interface.

Take Your Agent to the Next Level

This skill sets up your agent with the full LangWatch stack: tracing, prompt versioning, evaluation experiments, and agent simulation tests. Each step builds on the previous one.

Plan Limits

Prerequisites

Set up the LangWatch CLI first — you'll use it throughout. See CLI Setup.

For documentation access, also set up the MCP: See MCP Setup for installation instructions. If MCP installation fails, see docs fallback to fetch docs directly.

Consultant Mode

After completing all steps, don't just stop. See Consultant Mode — summarize everything you set up, then suggest 2-3 ways to go deeper based on what you learned about the codebase.

Step 1: Add Tracing

Add LangWatch tracing to capture all LLM calls, costs, and latency.

Use the LangWatch MCP to fetch the integration guide for this project's framework: call fetch_langwatch_docs with no args to see the index, then read the specific framework page
Install the LangWatch SDK (pip install langwatch or npm install langwatch)
Add instrumentation following the framework-specific guide
Add LANGWATCH_API_KEY to .env

Verify: Run the application briefly and confirm traces appear at https://app.langwatch.ai

Step 2: Version Your Prompts

Move hardcoded prompts to LangWatch Prompt CLI for version control and collaboration.

Use the LangWatch MCP to read the Prompts CLI docs: call fetch_langwatch_docs with url https://langwatch.ai/docs/prompt-management/cli.md
Install the CLI: npm install -g langwatch then langwatch login
Initialize: langwatch prompt init
Create prompts: langwatch prompt create <name> for each prompt in the code
Update application code to use langwatch.prompts.get("name") instead of hardcoded strings
Sync: langwatch prompt sync

Verify: Check that prompts appear at https://app.langwatch.ai in the Prompts section.

Do NOT hardcode prompts in code. Do NOT add try/catch fallbacks around prompts.get().

Step 3: Create an Evaluation Experiment

Build a batch evaluation to measure your agent's quality across many examples.

Use the LangWatch MCP to read the experiments SDK docs: call fetch_langwatch_docs with url https://langwatch.ai/docs/evaluations/experiments/sdk.md
Analyze the agent's code to understand what it does
Generate a dataset of 10-20 examples tailored to the agent's domain (NOT generic examples)
Create an experiment file:
- Python: Jupyter notebook with langwatch.experiment.init(), evaluation loop, and evaluators
- TypeScript: Script with langwatch.experiments.init() and evaluation.run()
Include at least one evaluator (LLM-as-judge for quality is a good default)

Verify: Run the experiment and check results appear in the LangWatch Experiments view.

Step 4: Add Agent Simulation Tests

Create scenario tests to validate agent behavior in realistic multi-turn conversations.

Use the LangWatch MCP to read Scenario docs: call fetch_scenario_docs with no args for the index
Install the Scenario SDK (pip install langwatch-scenario or npm install @langwatch/scenario)
Write scenario tests with AgentAdapter, UserSimulatorAgent, and JudgeAgent
Use semantic criteria in JudgeAgent (NOT regex matching)

Verify: Run the tests and confirm they pass.

NEVER invent your own testing framework. Use @langwatch/scenario / langwatch-scenario.

Common Mistakes

Do NOT skip any step -- each builds on the previous
Do NOT use generic datasets in the experiment -- tailor them to the agent's domain
Do NOT hardcode prompts -- use the Prompts CLI
Do NOT invent testing frameworks -- use Scenario
Do NOT use platform_ MCP tools -- this skill writes code in the project
Do NOT skip verification steps -- run the application/experiment/tests after each step

Related Skills

langwatch/tracing

development

VerifiedTrustedCommunity

Add LangWatch tracing and observability to your code. Use for both onboarding (instrument an entire codebase) and targeted operations (add tracing to a specific function or module). Supports Python and TypeScript with all major frameworks.

3,203SKILL.mdUpdated Apr 15, 2026

langwatch/scenarios

tools

VerifiedTrustedCommunity

Test your AI agent with simulation-based scenarios. Covers writing scenario test code (Scenario SDK), creating platform scenarios (CLI or MCP), and red teaming for security vulnerabilities. Auto-detects whether to use code or platform approach based on context.

3,203SKILL.mdUpdated Apr 15, 2026

langwatch/test-compliance

testing

VerifiedTrustedCommunity

Test that your AI agent stays observational and doesn't give prescriptive advice in regulated domains (healthcare, finance, legal). Creates scenario tests for boundary enforcement and red team tests for adversarial probing. Use when your agent advises but must not prescribe.

3,203SKILL.mdUpdated Apr 15, 2026

langwatch/test-compliance

langwatch/test-cli-usability

tools

VerifiedTrustedCommunity

Write scenario tests that verify your CLI tool is usable by AI agents. Ensures commands work non-interactively, provide clear output, and don't hang on prompts. Use when you want to prove your CLI is agent-friendly.

3,203SKILL.mdUpdated Apr 15, 2026

langwatch/test-cli-usability

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/langwatch/langwatch.git

# Copy into Claude Code skills folder (global)
cp -r langwatch/skills/level-up ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

langwatch/langwatch

3,203 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT