Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

patrykkopycinski/llm-eval-observatory

Name: llm-eval-observatory
Author: patrykkopycinski

skills/llm-eval-observatory/SKILL.md

npx skillsauth add patrykkopycinski/elastic-cursor-plugin llm-eval-observatory

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LLM Eval Observatory

Use eval_observatory to open an interactive 6-tab LLM observability dashboard.

Tools

| Tool | Purpose | |------|---------| | eval_observatory | Open the eval observatory dashboard |

When to use

"Show me the latest eval runs"
"Why did this eval fail?"
"Compare the last two runs"
"How much did that eval run cost?"

The "Debug with Claude" feature

In the Failure Investigation tab, click "Debug with Claude" on any failing eval. The full context (input, expected output, trace, evaluator reasoning) is sent to this conversation so I can analyze why it failed and suggest fixes.

patrykkopycinski/llm-eval-observatory

skills/llm-eval-observatory/SKILL.md

Debug and analyze LLM eval runs — view traces, compare runs, investigate failures, track costs. Use when debugging @kbn/evals failures, comparing eval runs, or analyzing LLM performance.

6 stars

development

Updated May 27, 2026

$ install --global

skillsauth

npx skillsauth add patrykkopycinski/elastic-cursor-plugin llm-eval-observatory

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 27, 2026, 6:26 AM127.5s1 file scanned

SKILL.md

name:: llm-eval-observatory
description:: >

LLM Eval Observatory

Use eval_observatory to open an interactive 6-tab LLM observability dashboard.

Tools

| Tool | Purpose | |------|---------| | eval_observatory | Open the eval observatory dashboard |

When to use

"Show me the latest eval runs"
"Why did this eval fail?"
"Compare the last two runs"
"How much did that eval run cost?"

The "Debug with Claude" feature

Related Skills

patrykkopycinski/security-threat-hunting

testing

VerifiedTrustedCommunity

Interactive threat hunting workflow using ES|QL and Elasticsearch queries — from hypothesis formulation through data exploration, IOC search, and finding documentation.

6SKILL.mdUpdated May 27, 2026

patrykkopycinski/security-threat-hunting

patrykkopycinski/security-inbox

testing

VerifiedTrustedCommunity

Start your security session with a personalized briefing — attacks, alerts, cases, rules, threat intel. Use as the first thing when starting security work.

6SKILL.mdUpdated May 27, 2026

patrykkopycinski/security-inbox

patrykkopycinski/security-full-setup

testing

VerifiedTrustedCommunity

Interactive guide for complete Elastic Security setup — discovers data sources, assesses detection coverage, configures rules, and creates security dashboards.

6SKILL.mdUpdated May 27, 2026

patrykkopycinski/security-full-setup

patrykkopycinski/security-detection-engineering

testing

VerifiedTrustedCommunity

Guide for authoring custom detection rules — from threat hypothesis through rule creation, testing, and tuning with KQL, EQL, ES|QL, and threshold rules.

6SKILL.mdUpdated May 27, 2026

patrykkopycinski/security-detection-engineering

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/patrykkopycinski/elastic-cursor-plugin.git

# Copy into Claude Code skills folder (global)
cp -r elastic-cursor-plugin/skills/llm-eval-observatory ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

patrykkopycinski/elastic-cursor-plugin

6 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT