skills/monitoring-operations/SKILL.md
Use when setting up OCI metrics, alarms, or log collection, or troubleshooting missing data and silent alarms. Covers metric namespace naming, MQL dimension requirements, alarm missing-data handling, Service Connector IAM gaps, and Cloud Guard integration. KEYWORDS: monitoring, alarm, metric, MQL, namespace, log, Service Connector, Log Analytics, Cloud Guard, missing data, oci_computeagent.
npx skillsauth add acedergren/agentic-tools monitoring-operationsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
NEVER debug "missing metrics" within the first 15 minutes
NEVER use = for alarm thresholds with sparse metrics
# WRONG - alarm never fires when metric has data gaps
MetricName[1m].mean() = 0
# RIGHT - handle missing data explicitly
MetricName[1m]{dataMissing=zero}.mean() > 0
NEVER omit the resourceId dimension in metric queries
# WRONG - returns no data (required dimension missing)
CPUUtilization[1m].mean()
# RIGHT - filter by instance OCID
CPUUtilization[1m]{resourceId="<instance-ocid>"}.mean()
Querying without dimensions returns data for ALL resources — usually not what's intended, and rate-limited at 1000 req/min.
NEVER set alarm thresholds without a trigger delay
# BAD - fires on every transient CPU spike (alert fatigue)
CPUUtilization[1m].mean() > 80
# BETTER - fires only on sustained breach
CPUUtilization[5m].mean() > 80
# + set trigger delay: 5 minutes (5 consecutive breaches)
NEVER create alarms without notification destinations
# WRONG - alarm fires but nobody is notified
oci monitoring alarm create ... --destinations '[]'
# RIGHT - always link to a notification topic
oci monitoring alarm create ... --destinations '["<notification-topic-ocid>"]'
Cost impact: undetected production outages = $5,000–50,000+/hour.
NEVER ignore Cloud Guard findings
OCI uses service-specific namespaces — using the wrong namespace returns no data with no error.
| Service | Namespace | Key Metrics |
|------------------|------------------------------|------------------------------------------|
| Compute | oci_computeagent | CPUUtilization, MemoryUtilization |
| Autonomous DB | oci_autonomous_database | CpuUtilization, StorageUtilization |
| Load Balancer | oci_lbaas | HttpRequests, UnHealthyBackendServers|
| Object Storage | oci_objectstorage | ObjectCount, BytesUploaded |
Common mistake: using oci_compute instead of oci_computeagent — the agent namespace requires the OCI Compute Agent to be running on the instance.
| Setting | Behavior | Use When |
|---------|----------|----------|
| treatMissingDataAsBreaching | Alarm fires if no data arrives | Critical services (silence = outage) |
| treatMissingDataAsNotBreaching | Alarm silent if no data | Optional or intermittent monitoring |
| {dataMissing=zero} in MQL | Treats gaps as 0 value | Request counters, throughput metrics |
Logs not appearing in Log Analytics?
│
├─ Is logging enabled on the resource?
│ └─ Compute: is oci-compute-agent running? (systemctl status oracle-cloud-agent)
│ └─ Functions: is logging enabled in function configuration?
│
├─ Is Service Connector configured and ACTIVE?
│ └─ Source: Log Group → Target: Log Analytics
│ └─ Check status: oci sch service-connector get --id <ocid>
│
├─ IAM policy for Service Connector?
│ └─ "Allow any-user to use log-content in tenancy"
│ └─ "Allow service loganalytics to READ logcontent in tenancy"
│ └─ Missing EITHER policy causes silent failure
│
└─ 10–15 minute ingestion lag?
└─ Wait before concluding logs are missing
Unfiltered queries scan ALL resources in compartment — slow and consumes rate limit budget.
# Expensive: scans all instances
CPUUtilization[1m].mean()
# Optimized: filter to specific instance
CPUUtilization[1m]{resourceId='<instance-ocid>'}.mean()
Rate limit: 1000 metric queries/minute per tenancy. Dashboard with many unfiltered widgets can exhaust this.
Load references/oci-monitoring-reference.md when:
Do NOT load for alarm threshold patterns, namespace gotchas, or log troubleshooting — this file covers those.
development
--- name: api-audit description: "Use when auditing API routes for schema drift, missing auth, or validation gaps. Scans routes against shared TypeScript types to find mismatches, missing middleware, and undocumented endpoints. Read-only — produces a severity-grouped report. Keywords: audit routes, schema drift, auth gaps, missing validation, type mismatch, orphaned schemas. Triggers on "audit API routes" or "find schema drift"." --- # API Route & Type Audit Skill ## When to Use Load this skil
development
Use when drafting, translating, polishing, or reviewing Swedish text so it sounds natural, fluent, contemporary, and appropriate for its audience. Triggers include "write better Swedish", "make this sound natural in Swedish", "translate into Swedish", "polish this Swedish", "tech company Swedish", "contemporary Swedish words", "Swedish developer docs", and "avoid Anglicisms".
development
Use when working with shadcn-svelte components, TanStack Table in Svelte 5, or Tailwind v4.1. Covers non-obvious reactivity bugs, library selection trade-offs, and migration pitfalls not in the official docs. Keywords: shadcn-svelte, TanStack Table, Tailwind v4.1, Svelte 5 runes, bits-ui, superforms, data table, svelte-check.
data-ai
Use when mapping IDCS claims to org membership after OAuth login succeeds. Covers mapProfileToUser, session.create.before, session.create.after hooks, MERGE INTO upserts, tenant-org mapping, and first-admin bootstrap. Keywords: IDCS groups, org_members, provisioning, session hooks, tenant map, MERGE INTO.