Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

laitszkin/improve-observability

Name: improve-observability
Author: laitszkin

improve-observability/SKILL.md

npx skillsauth add laitszkin/apollo-toolkit improve-observability

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Improve Observability

Dependencies

Required: none.
Conditional: none.
Optional: none.
Fallback: not applicable.

Standards

Evidence: Read the real execution path, current telemetry, and the true ownership model before deciding where visibility actually disappears or where existing logs no longer describe the live code path.
Execution: Add the smallest useful instrumentation around decision points, scope contracts, outcomes, failure reasons, and any cross-path lifecycle gaps between summary counters and detailed outcome records; when observability drift already exists, repair stale log names and structured fields so they match the current owner, scope, and lifecycle semantics.
Quality: Keep changes behavior-neutral, use structured high-signal telemetry, avoid secrets, and lock the signals with tests; treat stale terminology after refactors as an observability defect, not harmless wording.
Output: Report which stages are now observable, which fields or metrics to inspect, which stale signals were renamed or re-scoped, and which tests validate the instrumentation.

Overview

Use this skill to make a hard-to-debug path observable with minimal, evidence-driven changes. Prefer small, high-signal instrumentation around decision points, inputs, outcomes, and failure reasons rather than broad logging spam.

When To Use

Use this skill when the user asks to:

add observability to an existing feature or service
expose why a workflow fails, stalls, retries, or exits early
enrich logs around a specific bug, incident, queue job, API call, or state transition
add metrics, traces, or structured fields so operators can isolate root cause faster
improve debugging without redesigning the whole subsystem

Do not use this skill for generic bug fixing when the main request is behavior change rather than instrumentation.

Workflow

1. Trace the real execution path

Read the relevant entrypoints, orchestration layers, and current telemetry before editing.
Identify the exact stages where information disappears: validation, branching, external calls, persistence, retries, settlement, cleanup, or error handling.
When the same business event can flow through multiple execution paths such as harness, replay, batch worker, or production runtime, compare those paths explicitly and find where their observability contract diverges.
Identify the canonical owner of the workflow under today's implementation, such as account, request, job, batch, position projection, or transaction step, before changing any log labels or field names.
Distinguish canonical owners from compatibility projections or legacy mirrors. If the code still stores or exposes a compatibility projection, keep it diagnosable but do not let logs present that projection as the primary truth.
Reuse the project's existing logger, tracing library, metric naming style, and error taxonomy.

2. Choose the smallest useful signals

Add instrumentation only where it helps answer a concrete debugging question. Prefer:

stable request or job identifiers for cross-log correlation
structured fields for branch conditions, entity ids, counts, amounts, status, and reason codes
start/end markers for long multi-step flows
explicit logs for skipped paths and early returns
metrics or counters for outcome classes when aggregates matter
trace spans only when the project already uses tracing or timing data is necessary
paired detail records or structured child events when an aggregate success counter would otherwise hide which entities actually completed downstream follow-up work

Avoid logging secrets, full payload dumps, or highly volatile text that breaks searchability.

3. Instrument decision points, not just failures

For each critical stage, make these states observable when relevant:

entered the stage
key preconditions or derived scope
branch selected
external dependency result
persisted side effect or emitted command
final outcome and failure reason

If a failure is already logged, improve its context instead of duplicating another generic error line.

3.2 Keep aggregate and detail telemetry in lockstep

When a system reports aggregate counts such as success_count, processed_count, or remediation_success_count, ensure operators can reconcile those counts back to detailed records.

emit or persist one detail record per counted entity when feasible
carry the same identifiers and outcome stage across both aggregate and detailed telemetry
treat "aggregate says success but detail table is empty" as an observability bug, not as an acceptable reporting gap
if multiple runtime modes claim the same business event, keep the critical observability fields aligned across those modes unless the output contract intentionally differs

3.3 Repair terminology drift after refactors

When the codebase has moved from one ownership model or lifecycle model to another, audit existing observability for stale terminology.

rename log messages, event names, metrics, and structured fields that still describe retired concepts such as old owners, outdated scope units, or deprecated lifecycle phases
prefer names that describe the live source of truth, for example account, account_opportunity, projection, admission_health, or projected_step, instead of legacy names that survived only because a field was never revisited
preserve compatibility aliases only when operators or downstream dashboards still require them, and clearly label them as compatibility views rather than canonical truth
when a workflow still legitimately mixes canonical owners and derived projections, name both explicitly so operators can tell which field is authoritative
update or add tests that lock the renamed signals, especially around branch-specific reason codes, progress events, and structured field keys

3.1 Preserve cross-stage scope contracts

When a workflow derives scope in one stage and consumes it later, make that contract observable end-to-end.

log the derived scope close to where it is computed
carry the same identifiers into downstream stages so operators can diff them directly
add explicit missing_* and extra_* fields when one stage should be a superset or exact match of another
prefer fail-fast diagnostics when a scope mismatch makes downstream errors ambiguous

This is especially useful for pipelines such as discover -> precheck -> execution, where the real bug is often "stage B saw a different dependency set than stage A prepared".

4. Keep changes behavior-neutral

Do not silently change business logic while adding observability.
If a tiny safety fix is required to support the instrumentation, isolate it and explain why.
Prefer additive fields over renamed fields unless the old format is actively harmful.

5. Lock the signals with tests

Add or update tests that prove the new observability survives refactors. Focus on:

emitted log fields or reason codes for the important branches
metrics increments for success, skip, and failure paths
regression coverage for the exact opaque scenario that motivated the work
edge paths such as early-return, dependency failure, and partial completion
renamed observability fields or progress-event names after ownership-model or lifecycle refactors

Use existing test helpers for log capture and avoid brittle assertions on timestamps or fully formatted log strings.

Output Expectations

When finishing the task:

explain which stages are now observable
point to the key log fields, metrics, or spans that operators should inspect
mention any still-blind areas if they remain outside scope
run the most relevant tests for the touched instrumentation

Guardrails

Prefer structured, searchable telemetry over prose-heavy logs.
Minimize volume; high-signal beats high-noise.
Never add secrets, tokens, credentials, or raw personal data to telemetry.
Match existing naming conventions so dashboards and log queries stay coherent.

laitszkin/improve-observability

improve-observability/SKILL.md

Add focused observability to an existing system so opaque workflows become diagnosable. Use when users ask to improve observability, add instrumentation, expand logs/metrics/traces, expose failure reasons, or make a business flow easier to debug without changing the product behavior itself.

3 stars

development

Updated Apr 24, 2026

$ install --global

skillsauth

npx skillsauth add laitszkin/apollo-toolkit improve-observability

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 6:16 PM148.1s2 files scanned

SKILL.md

name:: improve-observability
description:: Add focused observability to an existing system so opaque workflows become diagnosable. Use when users ask to improve observability, add instrumentation, expand logs/metrics/traces, expose failure reasons, or make a business flow easier to debug without changing the product behavior itself.

Improve Observability

Dependencies

Required: none.
Conditional: none.
Optional: none.
Fallback: not applicable.

Standards

Evidence: Read the real execution path, current telemetry, and the true ownership model before deciding where visibility actually disappears or where existing logs no longer describe the live code path.
Execution: Add the smallest useful instrumentation around decision points, scope contracts, outcomes, failure reasons, and any cross-path lifecycle gaps between summary counters and detailed outcome records; when observability drift already exists, repair stale log names and structured fields so they match the current owner, scope, and lifecycle semantics.
Quality: Keep changes behavior-neutral, use structured high-signal telemetry, avoid secrets, and lock the signals with tests; treat stale terminology after refactors as an observability defect, not harmless wording.
Output: Report which stages are now observable, which fields or metrics to inspect, which stale signals were renamed or re-scoped, and which tests validate the instrumentation.

Overview

When To Use

Use this skill when the user asks to:

add observability to an existing feature or service
expose why a workflow fails, stalls, retries, or exits early
enrich logs around a specific bug, incident, queue job, API call, or state transition
add metrics, traces, or structured fields so operators can isolate root cause faster
improve debugging without redesigning the whole subsystem

Do not use this skill for generic bug fixing when the main request is behavior change rather than instrumentation.

Workflow

1. Trace the real execution path

Read the relevant entrypoints, orchestration layers, and current telemetry before editing.
Identify the exact stages where information disappears: validation, branching, external calls, persistence, retries, settlement, cleanup, or error handling.
When the same business event can flow through multiple execution paths such as harness, replay, batch worker, or production runtime, compare those paths explicitly and find where their observability contract diverges.
Identify the canonical owner of the workflow under today's implementation, such as account, request, job, batch, position projection, or transaction step, before changing any log labels or field names.
Distinguish canonical owners from compatibility projections or legacy mirrors. If the code still stores or exposes a compatibility projection, keep it diagnosable but do not let logs present that projection as the primary truth.
Reuse the project's existing logger, tracing library, metric naming style, and error taxonomy.

2. Choose the smallest useful signals

Add instrumentation only where it helps answer a concrete debugging question. Prefer:

stable request or job identifiers for cross-log correlation
structured fields for branch conditions, entity ids, counts, amounts, status, and reason codes
start/end markers for long multi-step flows
explicit logs for skipped paths and early returns
metrics or counters for outcome classes when aggregates matter
trace spans only when the project already uses tracing or timing data is necessary
paired detail records or structured child events when an aggregate success counter would otherwise hide which entities actually completed downstream follow-up work

Avoid logging secrets, full payload dumps, or highly volatile text that breaks searchability.

3. Instrument decision points, not just failures

For each critical stage, make these states observable when relevant:

entered the stage
key preconditions or derived scope
branch selected
external dependency result
persisted side effect or emitted command
final outcome and failure reason

If a failure is already logged, improve its context instead of duplicating another generic error line.

3.2 Keep aggregate and detail telemetry in lockstep

When a system reports aggregate counts such as success_count, processed_count, or remediation_success_count, ensure operators can reconcile those counts back to detailed records.

emit or persist one detail record per counted entity when feasible
carry the same identifiers and outcome stage across both aggregate and detailed telemetry
treat "aggregate says success but detail table is empty" as an observability bug, not as an acceptable reporting gap
if multiple runtime modes claim the same business event, keep the critical observability fields aligned across those modes unless the output contract intentionally differs

3.3 Repair terminology drift after refactors

When the codebase has moved from one ownership model or lifecycle model to another, audit existing observability for stale terminology.

rename log messages, event names, metrics, and structured fields that still describe retired concepts such as old owners, outdated scope units, or deprecated lifecycle phases
prefer names that describe the live source of truth, for example account, account_opportunity, projection, admission_health, or projected_step, instead of legacy names that survived only because a field was never revisited
preserve compatibility aliases only when operators or downstream dashboards still require them, and clearly label them as compatibility views rather than canonical truth
when a workflow still legitimately mixes canonical owners and derived projections, name both explicitly so operators can tell which field is authoritative
update or add tests that lock the renamed signals, especially around branch-specific reason codes, progress events, and structured field keys

3.1 Preserve cross-stage scope contracts

When a workflow derives scope in one stage and consumes it later, make that contract observable end-to-end.

log the derived scope close to where it is computed
carry the same identifiers into downstream stages so operators can diff them directly
add explicit missing_* and extra_* fields when one stage should be a superset or exact match of another
prefer fail-fast diagnostics when a scope mismatch makes downstream errors ambiguous

This is especially useful for pipelines such as discover -> precheck -> execution, where the real bug is often "stage B saw a different dependency set than stage A prepared".

4. Keep changes behavior-neutral

Do not silently change business logic while adding observability.
If a tiny safety fix is required to support the instrumentation, isolate it and explain why.
Prefer additive fields over renamed fields unless the old format is actively harmful.

5. Lock the signals with tests

Add or update tests that prove the new observability survives refactors. Focus on:

emitted log fields or reason codes for the important branches
metrics increments for success, skip, and failure paths
regression coverage for the exact opaque scenario that motivated the work
edge paths such as early-return, dependency failure, and partial completion
renamed observability fields or progress-event names after ownership-model or lifecycle refactors

Use existing test helpers for log capture and avoid brittle assertions on timestamps or fully formatted log strings.

Output Expectations

When finishing the task:

explain which stages are now observable
point to the key log fields, metrics, or spans that operators should inspect
mention any still-blind areas if they remain outside scope
run the most relevant tests for the touched instrumentation

Guardrails

Prefer structured, searchable telemetry over prose-heavy logs.
Minimize volume; high-signal beats high-noise.
Never add secrets, tokens, credentials, or raw personal data to telemetry.
Match existing naming conventions so dashboards and log queries stay coherent.

Related Skills

laitszkin/create-skill

development

VerifiedTrustedCommunity

Guides the agent through creating a new Agent Skill from scratch. Use when the user wants to build a skill, create a new skill, scaffold a skill directory, or author a SKILL.md. Do NOT use for optimising or rewriting existing skills — use 'optimise-skill' for that. Do NOT use for editing files that are already part of a skill. Do NOT use for creating non-skill content like documentation, scripts, or project files.

5SKILL.mdUpdated Jul 13, 2026

laitszkin/create-skill

laitszkin/create-skill

development

VerifiedTrustedCommunity

5SKILL.mdUpdated Jul 11, 2026

laitszkin/create-skill

laitszkin/review-pr

development

VerifiedTrustedCommunity

Review a pull request — interactive PR selection via `gh`, 4-dimension code review (hallucinated code, architecture, performance, test validity), then post severity-graded comments with fix suggestions on the PR. Not for spec-based review — use `review` instead.

5SKILL.mdUpdated Jun 11, 2026

laitszkin/version-release

tools

VerifiedTrustedCommunity

協助完成自動化版本發佈。同步文檔、更新版本號、推送 tag 並建立 GitHub Release。

5SKILL.mdUpdated May 29, 2026

laitszkin/version-release

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/laitszkin/apollo-toolkit.git

# Copy into Claude Code skills folder (global)
cp -r apollo-toolkit/improve-observability ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

laitszkin/apollo-toolkit

3 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT