skills/llm-eval-observatory/SKILL.md
Debug and analyze LLM eval runs — view traces, compare runs, investigate failures, track costs. Use when debugging @kbn/evals failures, comparing eval runs, or analyzing LLM performance.
npx skillsauth add patrykkopycinski/elastic-cursor-plugin llm-eval-observatoryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use eval_observatory to open an interactive 6-tab LLM observability dashboard.
| Tool | Purpose |
|------|---------|
| eval_observatory | Open the eval observatory dashboard |
In the Failure Investigation tab, click "Debug with Claude" on any failing eval. The full context (input, expected output, trace, evaluator reasoning) is sent to this conversation so I can analyze why it failed and suggest fixes.
testing
Interactive threat hunting workflow using ES|QL and Elasticsearch queries — from hypothesis formulation through data exploration, IOC search, and finding documentation.
testing
Start your security session with a personalized briefing — attacks, alerts, cases, rules, threat intel. Use as the first thing when starting security work.
testing
Interactive guide for complete Elastic Security setup — discovers data sources, assesses detection coverage, configures rules, and creates security dashboards.
testing
Guide for authoring custom detection rules — from threat hypothesis through rule creation, testing, and tuning with KQL, EQL, ES|QL, and threshold rules.