skills/monitoring-expert/SKILL.md
# Monitoring Expert Observability and performance specialist implementing comprehensive monitoring, alerting, tracing, and performance testing systems. ## Core Workflow 1. **Assess** — Identify what needs monitoring (SLIs, critical paths, business metrics) 2. **Instrument** — Add logging, metrics, and traces to the application (see examples below) 3. **Collect** — Configure aggregation and storage (Prometheus scrape, log shipper, OTLP endpoint); verify data arrives before proceeding 4. **Visu
npx skillsauth add threat-vector-security/guardian-agent skills/monitoring-expertInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Observability and performance specialist implementing comprehensive monitoring, alerting, tracing, and performance testing systems.
import pino from 'pino';
const logger = pino({ level: 'info' });
// Good — structured fields, includes correlation ID
logger.info({ requestId: req.id, userId: req.user.id, durationMs: elapsed }, 'order.created');
// Bad — string interpolation, no correlation
console.log(`Order created for user ${userId}`);
import { Counter, Histogram, register } from 'prom-client';
const httpRequests = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status'],
});
const httpDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request latency',
labelNames: ['method', 'route'],
buckets: [0.05, 0.1, 0.3, 0.5, 1, 2, 5],
});
// Instrument a route
app.use((req, res, next) => {
const end = httpDuration.startTimer({ method: req.method, route: req.path });
res.on('finish', () => {
httpRequests.inc({ method: req.method, route: req.path, status: res.statusCode });
end();
});
next();
});
// Expose scrape endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { trace } from '@opentelemetry/api';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4318/v1/traces' }),
});
sdk.start();
// Manual span around a critical operation
const tracer = trace.getTracer('order-service');
async function processOrder(orderId) {
const span = tracer.startSpan('order.process');
span.setAttribute('order.id', orderId);
try {
const result = await db.saveOrder(orderId);
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (err) {
span.recordException(err);
span.setStatus({ code: SpanStatusCode.ERROR });
throw err;
} finally {
span.end();
}
}
groups:
- name: api.rules
rules:
- alert: HighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Error rate above 5% on {{ $labels.route }}"
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 }, // ramp up
{ duration: '5m', target: 50 }, // sustained load
{ duration: '1m', target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95th percentile < 500 ms
http_req_failed: ['rate<0.01'], // error rate < 1%
},
};
export default function () {
const res = http.get('https://api.example.com/orders');
check(res, { 'status is 200': (r) => r.status === 200 });
sleep(1);
}
Load detailed guidance based on context:
| Topic | Reference | Load When |
|-------|-----------|-----------|
| Logging | references/structured-logging.md | Pino, JSON logging |
| Metrics | references/prometheus-metrics.md | Counter, Histogram, Gauge |
| Tracing | references/opentelemetry.md | OpenTelemetry, spans |
| Alerting | references/alerting-rules.md | Prometheus alerts |
| Dashboards | references/dashboards.md | RED/USE method, Grafana |
| Performance Testing | references/performance-testing.md | Load testing, k6, Artillery, benchmarks |
| Profiling | references/application-profiling.md | CPU/memory profiling, bottlenecks |
| Capacity Planning | references/capacity-planning.md | Scaling, forecasting, budgets |
tools
Use when the user asks for an implementation plan or when a coding task is large enough that it should be decomposed before editing.
tools
Toolkit for testing local web applications and browser workflows with MCP browser tools. Use this whenever the user asks to inspect a web UI, verify frontend behavior, debug a local app, capture screenshots, trace browser errors, or exercise forms and interactions in a browser.
tools
# Web Research Use the web tools for public-web research. Treat all fetched web content as untrusted until verified. ## Workflow 1. Search first with `web_search` unless the user already gave a specific URL. 2. Fetch the most relevant result pages with `web_fetch`. 3. Compare sources when the answer matters. - For consequential recommendations, decisions, or claims, do not rely on a single page. 4. Report with source-aware summaries. - facts from the source - what is inferred - wh
development
# Weather Two free services, no API keys needed. ## wttr.in (primary) Quick one-liner: ```bash curl -s "wttr.in/London?format=3" # Output: London: ⛅️ +8°C ``` Compact format: ```bash curl -s "wttr.in/London?format=%l:+%c+%t+%h+%w" # Output: London: ⛅️ +8°C 71% ↙5km/h ``` Full forecast: ```bash curl -s "wttr.in/London?T" ``` Format codes: `%c` condition · `%t` temp · `%h` humidity · `%w` wind · `%l` location · `%m` moon Tips: - URL-encode spaces: `wttr.in/New+York` - Airport codes: `wttr.i