Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

avav25/telemetry-stacks

Name: telemetry-stacks
Author: avav25

plugin/skills/telemetry-stacks/SKILL.md

npx skillsauth add avav25/ai-assets telemetry-stacks

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Telemetry Stacks

Reference for the production telemetry stacks supported by /analyze-prod and related analysis workflows. Identify the stack by checking CLAUDE.md, helm charts, prometheus-operator CRDs, or OTel collector configuration, then query directly using the patterns below. Pair with @observability-methods to map the surfaced signals (Latency / Traffic / Errors / Saturation, or Rate / Errors / Duration, or Utilization / Saturation / Errors) onto vendor queries.

Stack Reference Table

| Stack | Where to look | Common queries | |---|---|---| | Prometheus + Grafana | Dashboards, Alertmanager, prometheus-operator CRDs | rate(http_requests_total[5m]), histogram_quantile(0.99, ...bucket[5m]), up == 0 | | Datadog | APM, Service Catalog, Logs | service:x error_rate, @http.status_code:5*, anomaly monitors | | Honeycomb | Tracing + BubbleUp | WHERE service.name=x \| HEATMAP duration_ms, BubbleUp on slow traces | | New Relic | APM + Distributed Tracing | SELECT percentile(duration,50,95,99) FROM Transaction WHERE appName='x' | | Sentry | Issues filtered to release | error patterns + breadcrumbs (frontend / mobile) | | OTel + Tempo / Jaeger | Tempo / Jaeger UI | trace search by service.name, trace.id |

Prometheus + Grafana

Ingestion: pull-based scrape from /metrics endpoints; service discovery via Kubernetes API; long-term storage often via Thanos / Mimir / Cortex. PromQL is the query language.

Surface: Grafana dashboards (panels), Alertmanager (alerts), Prometheus UI (ad-hoc PromQL).

Common queries:

Request rate: rate(http_requests_total[5m])
p99 latency: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
Error rate: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
Target down: up == 0
Saturation (CPU throttling): rate(container_cpu_cfs_throttled_seconds_total[5m])

Right choice when: Kubernetes-native shop, open-source preference, cost-sensitive, custom application metrics dominate.

Datadog

Ingestion: push-based via Datadog Agent (per node) and APM tracer libraries (in-process). Auto-instruments common frameworks; log collection via agent tail.

Surface: APM service map and traces, Service Catalog, Logs Explorer, Dashboards, Monitors (alerts), Watchdog (anomaly detection).

Common queries:

Service error rate: service:x error_rate
HTTP 5xx logs: @http.status_code:5*
Custom metric: avg:trace.http.request.duration{service:x} by {resource_name}
Anomaly monitors and forecast monitors for capacity planning

Right choice when: managed SaaS preferred, multi-language polyglot stack, want APM + logs + metrics + RUM in one product, budget allows.

Honeycomb

Ingestion: push-based via OpenTelemetry or Honeycomb Beelines; event-based model — each event is a wide-column record with high-cardinality attributes.

Surface: Query Builder, BubbleUp (attribute correlation), Triggers (alerts), Service Map.

Common queries:

Heatmap of latency: WHERE service.name=x | HEATMAP duration_ms
BubbleUp on slow traces: select a slow region on the heatmap, BubbleUp surfaces which attributes (customer, endpoint, region) correlate with slowness.
Group by high-cardinality attribute: GROUP BY user.id, ORDER BY P99(duration_ms) DESC

Right choice when: high-cardinality debugging (per-customer, per-tenant, per-endpoint), tracing-first culture, hard-to-reproduce intermittent issues.

New Relic

Ingestion: push-based via New Relic agents (per language) and OTel; NRQL (New Relic Query Language) for queries.

Surface: APM, Distributed Tracing, Logs, Browser, Infrastructure, Alerts.

Common queries:

Percentile latency: SELECT percentile(duration, 50, 95, 99) FROM Transaction WHERE appName='x'
Error rate: SELECT percentage(count(*), WHERE error IS true) FROM Transaction WHERE appName='x'
Throughput by endpoint: SELECT rate(count(*), 1 minute) FROM Transaction WHERE appName='x' FACET name

Right choice when: enterprise with mixed legacy + modern stacks, want bundled APM + Infra + Logs + Browser, NRQL familiarity exists.

Sentry

Ingestion: push-based via Sentry SDKs (per language / framework); event-based — each event is an error / exception with stack trace, breadcrumbs, and context.

Surface: Issues view (grouped errors), Releases, Performance (transaction tracing), Alerts.

Common queries / filters:

Filter by release: release:1.2.3 environment:production
Filter by user impact: sort issues by users affected, descending
Error patterns + breadcrumbs (frontend / mobile) — breadcrumbs reconstruct user actions before the error
Suspect commits — Sentry links error frames to git commits via release tracking

Right choice when: frontend / mobile error tracking is the priority, want error-grouping + release correlation, paired with another stack for infra metrics.

OpenTelemetry + Tempo / Jaeger

Ingestion: push-based via OTel SDKs and OTel Collector; vendor-neutral semantic conventions. Tempo (Grafana Labs) and Jaeger (CNCF) are open-source trace storage backends.

Surface: Tempo UI or Jaeger UI; Grafana panels (TraceQL in Tempo); often paired with Prometheus for metrics and Loki for logs (the "LGTM stack" — Loki, Grafana, Tempo, Mimir).

Common queries:

Trace search: { service.name = "checkout" && duration > 1s } (TraceQL)
Find trace by ID: paste trace.id into the UI
Service dependency map: derived from spans
Span attribute filters: http.status_code, db.statement, error=true

Right choice when: vendor-neutral instrumentation is a priority, open-source stack preferred, already running Grafana + Prometheus and want tracing in the same UI.

Choosing a Stack

| Constraint | Likely stack | |---|---| | Kubernetes-native, OSS, custom metrics dominate | Prometheus + Grafana (+ Tempo for tracing) | | Managed SaaS, polyglot, all-in-one | Datadog or New Relic | | High-cardinality debugging | Honeycomb | | Frontend / mobile error tracking | Sentry (alongside another stack) | | Vendor-neutral, OSS tracing-first | OTel + Tempo / Jaeger |

When this applies

| Workflow | Apply this knowledge | |---|---| | /analyze-prod (snapshot phase) | Identify the stack from CLAUDE.md / helm / CRDs, then run vendor-specific queries to surface the signals chosen via @observability-methods | | /analyze-local (Docker logs) | If a local stack ships Prometheus / Grafana / OTel containers, use the same query patterns against the local endpoints | | /env-analyze (multi-scope) | Cross-reference K8s pod state with telemetry-stack metrics (Datadog Service Map, Prometheus up, Honeycomb heatmap) | | /infra-change (post-apply verify) | Query the stack post-apply to confirm no SLO regression — Prometheus histogram_quantile, Datadog monitors, NRQL percentile queries | | /bugfix (production-context bugs) | Pull traces (Honeycomb / Tempo / Jaeger / New Relic) and Sentry issues filtered to the offending release to localize the bug |

Integration

Used by: /analyze-prod, /analyze-local, /env-analyze, /infra-change, /bugfix
Companion knowledge: @observability-methods (which signals to query for — Golden Signals / RED / USE / Tracing), @cloud-platforms (managed-service native metrics via cloud CLI), @deployment-procedures (post-deploy verification queries)
External references:
- Prometheus docs — PromQL reference and prometheus-operator CRDs
- Datadog docs — APM, log search syntax, Watchdog
- Honeycomb docs — Query Builder, BubbleUp, Triggers
- New Relic docs — NRQL reference
- Sentry docs — Issue grouping, Releases, Performance
- OpenTelemetry specification; Tempo (Grafana Labs) and Jaeger (CNCF) documentation

avav25/telemetry-stacks

plugin/skills/telemetry-stacks/SKILL.md

Use this skill when querying a production telemetry stack, when authoring or reviewing an analysis workflow that needs vendor query examples, or when identifying which stack is deployed from `CLAUDE.md`, helm charts, `prometheus-operator` CRDs, or an OTel collector config — a knowledge skill providing a production telemetry stack reference covering Prometheus + Grafana, Datadog, Honeycomb, New Relic, Sentry, and OpenTelemetry + Tempo / Jaeger, with per-stack ingestion model, query patterns, UI surface, and when each stack is the right choice. Loaded by `/analyze-prod`, `/analyze-local`, `/env-analyze`, `/infra-change`, and `/bugfix` workflows when production-context diagnosis needs vendor-specific query syntax.

1 stars

testing

Updated May 23, 2026

$ install --global

skillsauth

npx skillsauth add avav25/ai-assets telemetry-stacks

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 23, 2026, 4:34 AM33.9s1 file scanned

SKILL.md

name:: telemetry-stacks
description:: Use this skill when querying a production telemetry stack, when authoring or reviewing an analysis workflow that needs vendor query examples, or when identifying which stack is deployed from `CLAUDE.md`, helm charts, `prometheus-operator` CRDs, or an OTel collector config — a knowledge skill providing a production telemetry stack reference covering Prometheus + Grafana, Datadog, Honeycomb, New Relic, Sentry, and OpenTelemetry + Tempo / Jaeger, with per-stack ingestion model, query patterns, UI surface, and when each stack is the right choice. Loaded by `/analyze-prod`, `/analyze-local`, `/env-analyze`, `/infra-change`, and `/bugfix` workflows when production-context diagnosis needs vendor-specific query syntax.
disable-model-invocation:: true

Telemetry Stacks

Stack Reference Table

Prometheus + Grafana

Ingestion: pull-based scrape from /metrics endpoints; service discovery via Kubernetes API; long-term storage often via Thanos / Mimir / Cortex. PromQL is the query language.

Surface: Grafana dashboards (panels), Alertmanager (alerts), Prometheus UI (ad-hoc PromQL).

Common queries:

Request rate: rate(http_requests_total[5m])
p99 latency: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
Error rate: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
Target down: up == 0
Saturation (CPU throttling): rate(container_cpu_cfs_throttled_seconds_total[5m])

Right choice when: Kubernetes-native shop, open-source preference, cost-sensitive, custom application metrics dominate.

Datadog

Ingestion: push-based via Datadog Agent (per node) and APM tracer libraries (in-process). Auto-instruments common frameworks; log collection via agent tail.

Surface: APM service map and traces, Service Catalog, Logs Explorer, Dashboards, Monitors (alerts), Watchdog (anomaly detection).

Common queries:

Service error rate: service:x error_rate
HTTP 5xx logs: @http.status_code:5*
Custom metric: avg:trace.http.request.duration{service:x} by {resource_name}
Anomaly monitors and forecast monitors for capacity planning

Right choice when: managed SaaS preferred, multi-language polyglot stack, want APM + logs + metrics + RUM in one product, budget allows.

Honeycomb

Ingestion: push-based via OpenTelemetry or Honeycomb Beelines; event-based model — each event is a wide-column record with high-cardinality attributes.

Surface: Query Builder, BubbleUp (attribute correlation), Triggers (alerts), Service Map.

Common queries:

Heatmap of latency: WHERE service.name=x | HEATMAP duration_ms
BubbleUp on slow traces: select a slow region on the heatmap, BubbleUp surfaces which attributes (customer, endpoint, region) correlate with slowness.
Group by high-cardinality attribute: GROUP BY user.id, ORDER BY P99(duration_ms) DESC

Right choice when: high-cardinality debugging (per-customer, per-tenant, per-endpoint), tracing-first culture, hard-to-reproduce intermittent issues.

New Relic

Ingestion: push-based via New Relic agents (per language) and OTel; NRQL (New Relic Query Language) for queries.

Surface: APM, Distributed Tracing, Logs, Browser, Infrastructure, Alerts.

Common queries:

Percentile latency: SELECT percentile(duration, 50, 95, 99) FROM Transaction WHERE appName='x'
Error rate: SELECT percentage(count(*), WHERE error IS true) FROM Transaction WHERE appName='x'
Throughput by endpoint: SELECT rate(count(*), 1 minute) FROM Transaction WHERE appName='x' FACET name

Right choice when: enterprise with mixed legacy + modern stacks, want bundled APM + Infra + Logs + Browser, NRQL familiarity exists.

Sentry

Ingestion: push-based via Sentry SDKs (per language / framework); event-based — each event is an error / exception with stack trace, breadcrumbs, and context.

Surface: Issues view (grouped errors), Releases, Performance (transaction tracing), Alerts.

Common queries / filters:

Filter by release: release:1.2.3 environment:production
Filter by user impact: sort issues by users affected, descending
Error patterns + breadcrumbs (frontend / mobile) — breadcrumbs reconstruct user actions before the error
Suspect commits — Sentry links error frames to git commits via release tracking

Right choice when: frontend / mobile error tracking is the priority, want error-grouping + release correlation, paired with another stack for infra metrics.

OpenTelemetry + Tempo / Jaeger

Ingestion: push-based via OTel SDKs and OTel Collector; vendor-neutral semantic conventions. Tempo (Grafana Labs) and Jaeger (CNCF) are open-source trace storage backends.

Surface: Tempo UI or Jaeger UI; Grafana panels (TraceQL in Tempo); often paired with Prometheus for metrics and Loki for logs (the "LGTM stack" — Loki, Grafana, Tempo, Mimir).

Common queries:

Trace search: { service.name = "checkout" && duration > 1s } (TraceQL)
Find trace by ID: paste trace.id into the UI
Service dependency map: derived from spans
Span attribute filters: http.status_code, db.statement, error=true

Right choice when: vendor-neutral instrumentation is a priority, open-source stack preferred, already running Grafana + Prometheus and want tracing in the same UI.

Choosing a Stack

When this applies

Integration

Used by: /analyze-prod, /analyze-local, /env-analyze, /infra-change, /bugfix
Companion knowledge: @observability-methods (which signals to query for — Golden Signals / RED / USE / Tracing), @cloud-platforms (managed-service native metrics via cloud CLI), @deployment-procedures (post-deploy verification queries)
External references:
- Prometheus docs — PromQL reference and prometheus-operator CRDs
- Datadog docs — APM, log search syntax, Watchdog
- Honeycomb docs — Query Builder, BubbleUp, Triggers
- New Relic docs — NRQL reference
- Sentry docs — Issue grouping, Releases, Performance
- OpenTelemetry specification; Tempo (Grafana Labs) and Jaeger (CNCF) documentation

Related Skills

avav25/knowledge-sync

development

VerifiedTrustedCommunity

Use this skill when running the recurring (daily) knowledge-base rescan for a repo that already has knowledge/.knowledge-sync.yml — the main-thread dispatcher that reads the config, computes the git delta since last_scanned_sha, maps changed paths to affected doc areas, early-exits cheaply when nothing changed, then fans out one Agent(content-writer) per affected area, applies the propose/direct update policy, advances the baseline only on success, and writes an L4 run log — all with the G1 untrusted-content choke-point, secret-scan, deny-list, and budget controls woven in. For first-time setup use /knowledge-sync-init.

1SKILL.mdUpdated May 24, 2026

avav25/knowledge-sync

avav25/knowledge-sync-init

development

VerifiedTrustedCommunity

Use this skill when bootstrapping scheduled knowledge-base sync for a repo that has no knowledge/.knowledge-sync.yml yet — to run one-time setup that detects the knowledge_root from CLAUDE.md/AGENTS.md, maps doc areas to source globs, records opt-in external sources (Linear/Notion/WebFetch, all disabled by default), captures a baseline last_scanned_sha, sets the per-area update policy, generates or seeds knowledge/CONVENTIONS.md, provisions the L4 memory dir, and offers to register the daily routine. Routes ongoing recurring sync operations to /knowledge-sync.

1SKILL.mdUpdated May 24, 2026

avav25/knowledge-sync-init

avav25/ai-skills-init

tools

VerifiedTrustedCommunity

Use this skill when bootstrapping a target repository to be ai-skills-aware — on the first run of any ai-skills workflow in a fresh repo, when adopting the ai-skills plugin in an existing repo, or after upgrading to a plugin version that adds new memory paths or templates, including when the user does not say "init" but asks to "set up" or "onboard" the repo — to detect codebase type, create CLAUDE.md + AGENTS.md scaffolding, initialize the .ai-skills-memory/ directory tree from L1 templates, and configure .gitignore. Idempotent — safe to re-run. Accepts `--codebase-type <type>` and `--overwrite`. Not for re-initializing only memory — use `/memory-init` instead.

1SKILL.mdUpdated May 18, 2026

avav25/ai-skills-init

avav25/plugin-author

tools

VerifiedTrustedCommunity

Use this skill when extending, repairing, or improving plugin assets, when ingesting a `/feedback` report as a fix-cycle backlog, or when you do not remember which lower-level command is right for the job — the umbrella workflow for ai-skills plugin-asset authoring and maintenance: creating, auditing, fixing, improving, refactoring, and migrating skills, agents, rules, hooks, prompts, schemas, and rubrics inside the plugin. Auto-classifies the request, loads the right knowledge skills (`@prompt-engineering`, `@context-engineering`, `@team-protocols`), and spawns the right subagents (`prompt-engineer`, `system-architect`, `python-engineer`, `software-engineer`, `qa-engineer`, `eval-judge`) via the `Agent` tool.

1SKILL.mdUpdated May 14, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/avav25/ai-assets.git

# Copy into Claude Code skills folder (global)
cp -r ai-assets/plugin/skills/telemetry-stacks ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

avav25/ai-assets

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT