skills/level-up/SKILL.md
Take your AI agent to the next level with full LangWatch integration. Adds tracing, prompt versioning, evaluation experiments, and simulation tests in one go. Use when the user wants comprehensive observability, testing, and prompt management for their agent.
npx skillsauth add langwatch/langwatch level-upInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill sets up your agent with the full LangWatch stack: tracing, prompt versioning, evaluation experiments, and agent simulation tests. Each step builds on the previous one.
See Plan Limits. The free plan has limits on prompts, scenarios, evaluators, and experiments. Focus on delivering value at each step — make each creation count. Show the user what works before they hit any limits. If you reach a limit, summarize what was accomplished and suggest upgrading at https://app.langwatch.ai/settings/subscription
Set up the LangWatch CLI first — you'll use it throughout. See CLI Setup.
For documentation access, also set up the MCP: See MCP Setup for installation instructions. If MCP installation fails, see docs fallback to fetch docs directly.
After completing all steps, don't just stop. See Consultant Mode — summarize everything you set up, then suggest 2-3 ways to go deeper based on what you learned about the codebase.
Add LangWatch tracing to capture all LLM calls, costs, and latency.
fetch_langwatch_docs with no args to see the index, then read the specific framework pagepip install langwatch or npm install langwatch)LANGWATCH_API_KEY to .envVerify: Run the application briefly and confirm traces appear at https://app.langwatch.ai
Move hardcoded prompts to LangWatch Prompt CLI for version control and collaboration.
fetch_langwatch_docs with url https://langwatch.ai/docs/prompt-management/cli.mdnpm install -g langwatch then langwatch loginlangwatch prompt initlangwatch prompt create <name> for each prompt in the codelangwatch.prompts.get("name") instead of hardcoded stringslangwatch prompt syncVerify: Check that prompts appear at https://app.langwatch.ai in the Prompts section.
Do NOT hardcode prompts in code. Do NOT add try/catch fallbacks around prompts.get().
Build a batch evaluation to measure your agent's quality across many examples.
fetch_langwatch_docs with url https://langwatch.ai/docs/evaluations/experiments/sdk.mdlangwatch.experiment.init(), evaluation loop, and evaluatorslangwatch.experiments.init() and evaluation.run()Verify: Run the experiment and check results appear in the LangWatch Experiments view.
Create scenario tests to validate agent behavior in realistic multi-turn conversations.
fetch_scenario_docs with no args for the indexpip install langwatch-scenario or npm install @langwatch/scenario)AgentAdapter, UserSimulatorAgent, and JudgeAgentVerify: Run the tests and confirm they pass.
NEVER invent your own testing framework. Use @langwatch/scenario / langwatch-scenario.
platform_ MCP tools -- this skill writes code in the projectdevelopment
Add LangWatch tracing and observability to your code. Use for both onboarding (instrument an entire codebase) and targeted operations (add tracing to a specific function or module). Supports Python and TypeScript with all major frameworks.
tools
Test your AI agent with simulation-based scenarios. Covers writing scenario test code (Scenario SDK), creating platform scenarios (CLI or MCP), and red teaming for security vulnerabilities. Auto-detects whether to use code or platform approach based on context.
testing
Test that your AI agent stays observational and doesn't give prescriptive advice in regulated domains (healthcare, finance, legal). Creates scenario tests for boundary enforcement and red team tests for adversarial probing. Use when your agent advises but must not prescribe.
tools
Write scenario tests that verify your CLI tool is usable by AI agents. Ensures commands work non-interactively, provide clear output, and don't hang on prompts. Use when you want to prove your CLI is agent-friendly.