Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

avav25/observability-methods

Name: observability-methods
Author: avav25

plugin/skills/observability-methods/SKILL.md

npx skillsauth add avav25/ai-assets observability-methods

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Observability Methods

Reference for the four industry-canonical observability methodologies used by production-analysis and incident-response workflows in this plugin. Pick one named method per problem class to give analysis a vocabulary handle and a defined signal coverage. Cross-reference SLI metrics, error-budget burn, and active alerts against the chosen method's signals.

Four Golden Signals (Google SRE)

Source: SRE Book Ch. 6 — Monitoring Distributed Systems.

User-facing service monitoring. Tracks the four signals a user-visible service must expose:

Latency — p50 / p95 / p99. Success and failed requests must be tracked separately (failed-fast errors otherwise hide tail latency).
Traffic — demand on the service, typically requests per second (RPS) or transactions per second.
Errors — rate of failed requests, broken down by error type (5xx, 4xx, timeouts, business-logic errors).
Saturation — how "full" the service is — resource headroom before degradation (CPU, memory, queue depth, connection pool, thread pool).

Apply when: the service is user-facing, has a defined SLO, or owns an SLI.

RED Method (Tom Wilkie — microservices)

Source: RED Method.

Request-driven microservices. Three signals per service:

Rate — requests per second the service is handling.
Errors — number/percentage of failed requests.
Duration — distribution of request latency, especially the p99 tail.

Apply when: diagnosing a request-response microservice, especially with many small services where per-service uniformity matters more than resource depth. Strong fit for slow-API and 5xx-spike investigations.

USE Method (Brendan Gregg — resources)

Source: USE Method.

Resource-driven. For every resource (CPU, memory, disk, network, queue), check three signals:

Utilization — percentage of time the resource was busy.
Saturation — degree of extra work queued / waiting (run queue, swap, retries).
Errors — error events for that resource (OOMKilled, disk I/O errors, NIC drops).

Apply when: resource exhaustion is suspected — OOMKilled pods, CPU throttling, disk pressure, node-level pressure, container crashloops driven by limits.

Distributed Tracing

For latency or cross-service failures, a single trace through 5–7 services is the canonical root-cause path:

Storage: Tempo / Jaeger / Zipkin.
Instrumentation: OpenTelemetry SDK (vendor-neutral) or vendor agent (Datadog APM, New Relic, Honeycomb Beelines).
Search: by service.name, trace.id, or slow-trace heatmap. Honeycomb BubbleUp narrows attributes that correlate with slow traces.
Span attributes that matter: HTTP route, DB query (statement + duration), external API call (host + latency), retry count, error/exception flag.

Apply when: a request crosses ≥2 services, latency is high but no single service is obviously saturated, error blame is unclear, or aggregate metrics show the symptom but not the cause.

Method-to-Problem Mapping

| Problem | Method | Why | |---|---|---| | Slow API | RED | Duration p99 tail is the surfaced signal | | 5xx spike | Golden Signals | Errors + Saturation cover cause and capacity together | | OOMKilled / crashloop | USE | Memory/CPU saturation + resource errors | | Customer-reported latency | RED + Distributed Tracing | RED localizes the slow service, tracing finds the slow span | | Node / disk pressure | USE | Resource-axis Utilization + Saturation | | Cross-service failure (no single hotspot) | Distributed Tracing | Single trace reveals the failing hop | | New service with SLO | Golden Signals | Establish baseline for all four user-facing signals |

When this applies

| Workflow | Apply this knowledge | |---|---| | /analyze-prod (snapshot phase) | Pick a named method per problem class before running queries; cross-reference SLI/SLO and alerts against the method's signals | | /analyze-local (Docker logs) | Apply USE on container resource limits when local services crashloop or are OOMKilled | | /env-analyze (multi-scope) | Use Golden Signals or RED to frame service-level findings; USE for node/resource layer | | /infra-change (post-apply verify) | Use Golden Signals to confirm SLO is not regressed by the change | | /bugfix (production-context bugs) | Use RED + tracing to localize the failing service / span before code-level investigation |

Integration

Used by: /analyze-prod, /analyze-local, /env-analyze, /infra-change, /bugfix
Companion knowledge: @telemetry-stacks (vendor-specific queries that surface these signals), @cloud-platforms (managed-service metric sources), @deployment-procedures (post-deploy health verification)
External references:
- Google SRE Book Ch. 6 — Monitoring Distributed Systems (Four Golden Signals)
- Tom Wilkie — RED Method
- Brendan Gregg — USE Method
- OpenTelemetry specification (tracing semantic conventions)

avav25/observability-methods

plugin/skills/observability-methods/SKILL.md

Use this skill when picking a diagnostic vocabulary for a latency, error-rate, saturation, or crashloop investigation, when authoring or reviewing an analysis workflow that needs the method reference, or when correlating SLI metrics and active alerts against a method's signal set — a knowledge skill of industry-canonical observability methodologies (Four Golden Signals, RED, USE, Distributed Tracing) with method-to-problem mapping that explains which signals each method surfaces and when each method applies. Loaded by `/analyze-prod`, `/analyze-local`, `/env-analyze`, `/infra-change`, and `/bugfix` workflows when production-context diagnosis needs a named methodology.

1 stars

development

Updated May 23, 2026

$ install --global

skillsauth

npx skillsauth add avav25/ai-assets observability-methods

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 23, 2026, 4:34 AM73.3s1 file scanned

SKILL.md

name:: observability-methods
description:: Use this skill when picking a diagnostic vocabulary for a latency, error-rate, saturation, or crashloop investigation, when authoring or reviewing an analysis workflow that needs the method reference, or when correlating SLI metrics and active alerts against a method's signal set — a knowledge skill of industry-canonical observability methodologies (Four Golden Signals, RED, USE, Distributed Tracing) with method-to-problem mapping that explains which signals each method surfaces and when each method applies. Loaded by `/analyze-prod`, `/analyze-local`, `/env-analyze`, `/infra-change`, and `/bugfix` workflows when production-context diagnosis needs a named methodology.
disable-model-invocation:: true

Observability Methods

Four Golden Signals (Google SRE)

Source: SRE Book Ch. 6 — Monitoring Distributed Systems.

User-facing service monitoring. Tracks the four signals a user-visible service must expose:

Latency — p50 / p95 / p99. Success and failed requests must be tracked separately (failed-fast errors otherwise hide tail latency).
Traffic — demand on the service, typically requests per second (RPS) or transactions per second.
Errors — rate of failed requests, broken down by error type (5xx, 4xx, timeouts, business-logic errors).
Saturation — how "full" the service is — resource headroom before degradation (CPU, memory, queue depth, connection pool, thread pool).

Apply when: the service is user-facing, has a defined SLO, or owns an SLI.

RED Method (Tom Wilkie — microservices)

Source: RED Method.

Request-driven microservices. Three signals per service:

Rate — requests per second the service is handling.
Errors — number/percentage of failed requests.
Duration — distribution of request latency, especially the p99 tail.

USE Method (Brendan Gregg — resources)

Source: USE Method.

Resource-driven. For every resource (CPU, memory, disk, network, queue), check three signals:

Utilization — percentage of time the resource was busy.
Saturation — degree of extra work queued / waiting (run queue, swap, retries).
Errors — error events for that resource (OOMKilled, disk I/O errors, NIC drops).

Apply when: resource exhaustion is suspected — OOMKilled pods, CPU throttling, disk pressure, node-level pressure, container crashloops driven by limits.

Distributed Tracing

For latency or cross-service failures, a single trace through 5–7 services is the canonical root-cause path:

Storage: Tempo / Jaeger / Zipkin.
Instrumentation: OpenTelemetry SDK (vendor-neutral) or vendor agent (Datadog APM, New Relic, Honeycomb Beelines).
Search: by service.name, trace.id, or slow-trace heatmap. Honeycomb BubbleUp narrows attributes that correlate with slow traces.
Span attributes that matter: HTTP route, DB query (statement + duration), external API call (host + latency), retry count, error/exception flag.

Apply when: a request crosses ≥2 services, latency is high but no single service is obviously saturated, error blame is unclear, or aggregate metrics show the symptom but not the cause.

Method-to-Problem Mapping

When this applies

Integration

Used by: /analyze-prod, /analyze-local, /env-analyze, /infra-change, /bugfix
Companion knowledge: @telemetry-stacks (vendor-specific queries that surface these signals), @cloud-platforms (managed-service metric sources), @deployment-procedures (post-deploy health verification)
External references:
- Google SRE Book Ch. 6 — Monitoring Distributed Systems (Four Golden Signals)
- Tom Wilkie — RED Method
- Brendan Gregg — USE Method
- OpenTelemetry specification (tracing semantic conventions)

Related Skills

avav25/knowledge-sync

development

VerifiedTrustedCommunity

Use this skill when running the recurring (daily) knowledge-base rescan for a repo that already has knowledge/.knowledge-sync.yml — the main-thread dispatcher that reads the config, computes the git delta since last_scanned_sha, maps changed paths to affected doc areas, early-exits cheaply when nothing changed, then fans out one Agent(content-writer) per affected area, applies the propose/direct update policy, advances the baseline only on success, and writes an L4 run log — all with the G1 untrusted-content choke-point, secret-scan, deny-list, and budget controls woven in. For first-time setup use /knowledge-sync-init.

1SKILL.mdUpdated May 24, 2026

avav25/knowledge-sync

avav25/knowledge-sync-init

development

VerifiedTrustedCommunity

Use this skill when bootstrapping scheduled knowledge-base sync for a repo that has no knowledge/.knowledge-sync.yml yet — to run one-time setup that detects the knowledge_root from CLAUDE.md/AGENTS.md, maps doc areas to source globs, records opt-in external sources (Linear/Notion/WebFetch, all disabled by default), captures a baseline last_scanned_sha, sets the per-area update policy, generates or seeds knowledge/CONVENTIONS.md, provisions the L4 memory dir, and offers to register the daily routine. Routes ongoing recurring sync operations to /knowledge-sync.

1SKILL.mdUpdated May 24, 2026

avav25/knowledge-sync-init

avav25/ai-skills-init

tools

VerifiedTrustedCommunity

Use this skill when bootstrapping a target repository to be ai-skills-aware — on the first run of any ai-skills workflow in a fresh repo, when adopting the ai-skills plugin in an existing repo, or after upgrading to a plugin version that adds new memory paths or templates, including when the user does not say "init" but asks to "set up" or "onboard" the repo — to detect codebase type, create CLAUDE.md + AGENTS.md scaffolding, initialize the .ai-skills-memory/ directory tree from L1 templates, and configure .gitignore. Idempotent — safe to re-run. Accepts `--codebase-type <type>` and `--overwrite`. Not for re-initializing only memory — use `/memory-init` instead.

1SKILL.mdUpdated May 18, 2026

avav25/ai-skills-init

avav25/plugin-author

tools

VerifiedTrustedCommunity

Use this skill when extending, repairing, or improving plugin assets, when ingesting a `/feedback` report as a fix-cycle backlog, or when you do not remember which lower-level command is right for the job — the umbrella workflow for ai-skills plugin-asset authoring and maintenance: creating, auditing, fixing, improving, refactoring, and migrating skills, agents, rules, hooks, prompts, schemas, and rubrics inside the plugin. Auto-classifies the request, loads the right knowledge skills (`@prompt-engineering`, `@context-engineering`, `@team-protocols`), and spawns the right subagents (`prompt-engineer`, `system-architect`, `python-engineer`, `software-engineer`, `qa-engineer`, `eval-judge`) via the `Agent` tool.

1SKILL.mdUpdated May 14, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/avav25/ai-assets.git

# Copy into Claude Code skills folder (global)
cp -r ai-assets/plugin/skills/observability-methods ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

avav25/ai-assets

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT