langwatch

18 verified skills57,654 total stars

langwatch-kanban

Manage the LangWatch Kanban GitHub project board — sync statuses, view your board, find stale items, move issues, assign work.

tools3,203

Set up comprehensive evaluations for your AI agent with LangWatch — experiments (batch testing), evaluators (scoring functions), datasets, online evaluation (production monitoring), and guardrails (real-time blocking). Supports both code (SDK) and platform (MCP) approaches. Use when the user wants to evaluate, test, benchmark, monitor, or safeguard their agent.

tools3,203

code-review

Project-level code review: check changed files against LangWatch codebase rules (IDs, multitenancy, layering, naming, SRP).

development3,203

scenarios

Test your AI agent with simulation-based scenarios. Covers writing scenario test code (Scenario SDK), creating platform scenarios (CLI or MCP), and red teaming for security vulnerabilities. Auto-detects whether to use code or platform approach based on context.

tools3,203

evaluate-multimodal

Evaluate multimodal AI agents that process images, audio, PDFs, or other files. Sets up evaluations using LangWatch's LLM-as-judge with image inputs, Scenario's multimodal testing, and document parsing evaluation patterns. Use when your agent handles non-text inputs.

testing3,203

analytics

Analyze your AI agent's performance using LangWatch analytics. Use when the user wants to understand costs, latency, error rates, usage trends, or debug specific traces. Works with any LangWatch-instrumented agent.

development3,203

browser-pair

Collaborative headed browser session for UI work. Launch Playwright Chromium visible to the user, handle auth, then interactively drive the browser while the user watches and gives real-time visual feedback. Edit code and refresh to verify fixes live. Use when the user says 'browser pair', 'paired browser', 'let's look at this together', 'open chromium', or wants to iterate on UI with live visual feedback.

development3,203

feature-map

Maintain the canonical LangWatch feature map (/feature-map.json). Use when adding features, APIs, MCP tools, CLI commands, or skills — to update the central registry and keep surfaces in sync.

tools3,203

test-cli-usability

Write scenario tests that verify your CLI tool is usable by AI agents. Ensures commands work non-interactively, provide clear output, and don't hang on prompts. Use when you want to prove your CLI is agent-friendly.

tools3,203

test-compliance

Test that your AI agent stays observational and doesn't give prescriptive advice in regulated domains (healthcare, finance, legal). Creates scenario tests for boundary enforcement and red team tests for adversarial probing. Use when your agent advises but must not prescribe.

testing3,203

level-up

Take your AI agent to the next level with full LangWatch integration. Adds tracing, prompt versioning, evaluation experiments, and simulation tests in one go. Use when the user wants comprehensive observability, testing, and prompt management for their agent.

development3,203

debug-instrumentation

Debug and improve your LangWatch traces. Inspects production traces for missing input/output, disconnected spans, unlabeled traces, and missing metadata. Use when traces look broken or incomplete.

development3,203

browser-test

Validate a feature works by driving a real browser with Playwright MCP. No test files — just interactive verification.

tools3,203

reuse-worktree

Reset this worktree to latest main and create a new branch for fresh work.

testing3,203

prompts

Version and manage your agent's prompts with LangWatch Prompts CLI. Use for both onboarding (set up prompt versioning for an entire codebase) and targeted operations (version a specific prompt, create a new prompt version). Supports Python and TypeScript.

tools3,203

improve-setup

Expert AI engineering consultant for your LangWatch setup. Audits your codebase, traces, evaluations, and scenarios, then guides you to improve — starting from low-hanging fruit and going deeper. Use when you want to level up your agent's engineering quality.

development3,203

tracing

Add LangWatch tracing and observability to your code. Use for both onboarding (instrument an entire codebase) and targeted operations (add tracing to a specific function or module). Supports Python and TypeScript with all major frameworks.

development3,203

generate-rag-dataset

Generate a synthetic evaluation dataset from your RAG knowledge base. Creates diverse Q&A pairs with expected answers and relevant context, ready for LangWatch experiments and platform import. Use when you need test data for your RAG pipeline.

testing3,203

langwatch

langwatch-kanban

evaluations

code-review

scenarios

evaluate-multimodal

analytics

browser-pair

feature-map

test-cli-usability

test-compliance

level-up

debug-instrumentation

browser-test

reuse-worktree

prompts

improve-setup

tracing

generate-rag-dataset

Adoption

langwatch

langwatch-kanban

evaluations

code-review

scenarios

evaluate-multimodal

analytics

browser-pair

feature-map

test-cli-usability

test-compliance

level-up

debug-instrumentation

browser-test

reuse-worktree

prompts

improve-setup

tracing

generate-rag-dataset