skills/opentelemetry/SKILL.md
OpenTelemetry for distributed tracing, metrics, and logging in production systems. Use when user mentions "opentelemetry", "otel", "distributed tracing", "traces", "spans", "metrics collection", "observability", "jaeger", "prometheus", "grafana", "OTLP", "instrumentation", or setting up application monitoring.
npx skillsauth add 1mangesh1/dev-skills-collection opentelemetryInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
OpenTelemetry (OTel) is a vendor-neutral observability framework for generating, collecting, and exporting telemetry data. It defines three signals:
All three signals share a common context propagation mechanism so they can be correlated.
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-grpc @opentelemetry/exporter-metrics-otlp-grpc
Create tracing.ts (must load before application code):
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter(),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter(), exportIntervalMillis: 15000,
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
process.on('SIGTERM', () => sdk.shutdown());
Run with: node --require ./tracing.js app.js
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('my-service', '1.0.0');
async function processOrder(orderId: string) {
return tracer.startActiveSpan('processOrder', async (span) => {
try {
span.setAttribute('order.id', orderId);
span.addEvent('validation_started');
span.addEvent('order_processed', { 'order.total': 42.50 });
span.setStatus({ code: SpanStatusCode.OK });
} catch (err) {
span.setStatus({ code: SpanStatusCode.ERROR, message: String(err) });
span.recordException(err as Error);
throw err;
} finally {
span.end();
}
});
}
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
opentelemetry-instrument --service_name my-service \
--exporter_otlp_endpoint http://localhost:4317 python app.py
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import Resource
resource = Resource.create({"service.name": "my-service"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)
metric_reader = PeriodicExportingMetricReader(OTLPMetricExporter())
metrics.set_meter_provider(MeterProvider(resource=resource, metric_readers=[metric_reader]))
from opentelemetry import trace
tracer = trace.get_tracer("my-service", "1.0.0")
def process_order(order_id: str):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
span.add_event("validation_started")
span.add_event("order_processed", {"order.total": 42.50})
A span represents a unit of work. Key fields: name (operation), kind (CLIENT/SERVER/PRODUCER/CONSUMER/INTERNAL), start_time/end_time, status (OK/ERROR/UNSET), attributes (key-value pairs), events (timestamped entries), links (related spans).
Context propagation passes trace context across process boundaries via the W3C traceparent header: 00-<trace-id>-<span-id>-<trace-flags>. Auto-instrumentation handles this for HTTP. For manual propagation:
import { propagation, context } from '@opentelemetry/api';
// Inject into outgoing headers
const headers: Record<string, string> = {};
propagation.inject(context.active(), headers);
// Extract from incoming headers
const ctx = propagation.extract(context.active(), incomingHeaders);
| Instrument | Use Case | Example |
|---|---|---|
| Counter | Monotonically increasing count | requests_total |
| UpDownCounter | Value that increases or decreases | active_connections |
| Histogram | Distribution of values | request_duration_ms |
| Gauge | Point-in-time value via callback | cpu_usage_percent |
import { metrics } from '@opentelemetry/api';
const meter = metrics.getMeter('my-service');
const requestCounter = meter.createCounter('http.requests', { description: 'Total HTTP requests' });
const requestDuration = meter.createHistogram('http.request.duration', { description: 'ms', unit: 'ms' });
const activeConns = meter.createUpDownCounter('http.active_connections');
meter.createObservableGauge('system.cpu.usage').addCallback((r) => {
r.observe(getCpuUsage(), { 'cpu.core': '0' });
});
requestCounter.add(1, { 'http.method': 'GET', 'http.route': '/users' });
requestDuration.record(145, { 'http.method': 'GET' });
activeConns.add(1); // on connect
activeConns.add(-1); // on disconnect
gRPC (port 4317) / HTTP/protobuf (port 4318):
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc # or http/protobuf
# Signal-specific overrides:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:4318/v1/traces
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318/v1/metrics
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:4318/v1/logs
# Auth headers:
OTEL_EXPORTER_OTLP_HEADERS="x-api-key=abc123,x-team=backend"
The Collector receives, processes, and exports telemetry in a pipeline:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc: { endpoint: 0.0.0.0:4317 }
http: { endpoint: 0.0.0.0:4318 }
processors:
batch: { timeout: 5s, send_batch_size: 1024 }
memory_limiter: { check_interval: 1s, limit_mib: 512 }
resource:
attributes:
- { key: environment, value: production, action: upsert }
exporters:
otlp/jaeger: { endpoint: jaeger:4317, tls: { insecure: true } }
prometheus: { endpoint: 0.0.0.0:8889 }
debug: { verbosity: detailed }
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
| Backend | Signal | Notes |
|---|---|---|
| Jaeger | Traces | Open source, native OTLP support |
| Prometheus + Grafana | Metrics | Prometheus scrapes collector; Grafana visualizes |
| Datadog | All | Use Datadog exporter or OTLP endpoint |
| Honeycomb | Traces, Logs | Native OTLP; API key via OTEL_EXPORTER_OTLP_HEADERS |
| Grafana Tempo | Traces | Pairs with Grafana for visualization |
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8889:8889" # Prometheus metrics
depends_on: [jaeger]
jaeger:
image: jaegertracing/all-in-one:latest
environment: [COLLECTOR_OTLP_ENABLED=true]
ports:
- "16686:16686" # Jaeger UI
- "14268:14268" # Jaeger collector HTTP
Point your app at http://localhost:4317 (gRPC) or http://localhost:4318 (HTTP). Jaeger UI: http://localhost:16686.
| Variable | Purpose | Example |
|---|---|---|
| OTEL_SERVICE_NAME | Identifies the service | order-service |
| OTEL_EXPORTER_OTLP_ENDPOINT | Collector address | http://localhost:4317 |
| OTEL_EXPORTER_OTLP_PROTOCOL | Transport protocol | grpc or http/protobuf |
| OTEL_EXPORTER_OTLP_HEADERS | Auth headers | x-api-key=abc123 |
| OTEL_TRACES_SAMPLER | Sampling strategy | parentbased_traceidratio |
| OTEL_TRACES_SAMPLER_ARG | Sampler argument | 0.1 (10%) |
| OTEL_RESOURCE_ATTRIBUTES | Additional resource attrs | deployment.environment=prod |
| OTEL_LOG_LEVEL | SDK log level | debug |
| OTEL_PROPAGATORS | Context propagation format | tracecontext,baggage |
| Sampler | Behavior |
|---|---|
| always_on | Record every span. Dev only. |
| always_off | Record nothing. Disables tracing. |
| traceidratio | Sample a percentage based on trace ID. Arg: 0.0-1.0. |
| parentbased_always_on | Respect parent decision; sample root spans. |
| parentbased_traceidratio | Respect parent; sample unparented at given ratio. |
For production, parentbased_traceidratio with 0.01-0.1 is a common starting point.
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.05
Programmatic equivalent:
import { TraceIdRatioBasedSampler, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';
const sampler = new ParentBasedSampler({ root: new TraceIdRatioBasedSampler(0.05) });
span.setAttribute('user.id', userId);
span.setAttribute('order.item_count', items.length);
span.setAttribute('feature_flag.dark_mode', true);
span.addEvent('cache_miss', { 'cache.key': cacheKey });
span.addEvent('retry_attempt', { 'attempt.number': 3, 'error.type': 'timeout' });
span.recordException(error); // creates event with stack trace
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
Follow semantic conventions for attribute names: http.request.method, db.system, rpc.service.
Auto-instrumentation covers most HTTP libraries. Add business context via middleware:
app.use((req, res, next) => {
const span = trace.getActiveSpan();
if (span) {
span.setAttribute('http.request.header.x_request_id', req.headers['x-request-id']);
span.setAttribute('user.id', req.user?.id);
}
next();
});
Auto-instrumentation handles pg, mysql2, mongoose, etc. Add business context manually:
async function getUser(userId: string) {
return tracer.startActiveSpan('db.getUser', async (span) => {
span.setAttribute('db.system', 'postgresql');
span.setAttribute('db.operation', 'SELECT');
span.setAttribute('user.id', userId);
const result = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
span.setAttribute('db.result_count', result.rows.length);
span.end();
return result.rows[0];
});
}
with tracer.start_as_current_span("call_payment_api") as span:
span.set_attribute("peer.service", "payment-gateway")
span.set_attribute("payment.amount", amount)
span.set_attribute("payment.currency", "USD")
try:
response = requests.post(payment_url, json=payload)
span.set_attribute("http.response.status_code", response.status_code)
except requests.exceptions.Timeout:
span.set_status(StatusCode.ERROR, "Payment API timeout")
raise
Baggage propagates key-value pairs across service boundaries without adding them to spans. Useful for tenant IDs, feature flags, or routing hints.
import { propagation, context } from '@opentelemetry/api';
// Set baggage in service A
const bag = propagation.createBaggage({
'tenant.id': { value: 'acme-corp' },
'feature.flag': { value: 'new-checkout' },
});
const ctx = propagation.setBaggage(context.active(), bag);
// Baggage propagates automatically via headers
// Read baggage in service B
const currentBaggage = propagation.getBaggage(context.active());
const tenantId = currentBaggage?.getEntry('tenant.id')?.value;
Baggage travels as HTTP headers. Do not put sensitive data in it. Keep entries small -- every downstream service receives all baggage.
tools
Parallel execution with xargs, GNU parallel, and batch processing patterns. Use when user mentions "xargs", "parallel", "batch processing", "run in parallel", "parallel execution", "process list of files", "bulk operations", "concurrent commands", "map over files", or running commands on multiple inputs.
development
WebSocket implementation for real-time bidirectional communication. Use when user mentions "websocket", "ws://", "wss://", "real-time", "live updates", "chat application", "socket.io", "Server-Sent Events", "SSE", "push notifications", "live data", "streaming data", "bidirectional communication", "websocket server", "reconnection", or building real-time features.
tools
Frontend bundler configuration for Webpack and Vite. Use when user mentions "webpack", "vite", "bundler", "vite config", "webpack config", "code splitting", "tree shaking", "hot module replacement", "HMR", "build optimization", "bundle size", "chunk splitting", "loader", "plugin", "esbuild", "rollup", "dev server", or configuring JavaScript build tools.
tools
VS Code configuration, extensions, keybindings, and workspace optimization. Use when user mentions "vscode", "vs code", "vscode settings", "vscode extensions", "keybindings", "code editor", "workspace settings", "settings.json", "launch.json", "tasks.json", "vscode snippets", "devcontainer", "remote development", or customizing their VS Code setup.