i18n/de/skills/instrument-distributed-tracing/SKILL.md
Anwendungen mit OpenTelemetry fuer verteiltes Tracing instrumentieren, einschliesslich automatischer und manueller Instrumentierung, Kontextpropagierung, Sampling-Strategien und Integration mit Jaeger oder Tempo. Verwenden, wenn Latenzprobleme in verteilten Systemen debuggt werden, der Request-Fluss ueber Microservices verstanden werden soll, Traces mit Logs und Metriken fuer eine Ursachenanalyse korreliert werden, die End-to-End-Latenz gemessen wird oder von Legacy-Tracing-Systemen zu OpenTelemetry migriert wird.
npx skillsauth add pjt222/agent-almanac instrument-distributed-tracingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
OpenTelemetry-Distributed-Tracing implementieren, um Anfragen ueber Microservices zu verfolgen und Performance-Engpaesse zu identifizieren.
Unter Extended Examples sind vollstaendige Konfigurationsdateien und Templates verfuegbar.
Jaeger oder Grafana Tempo zum Empfangen und Speichern von Traces bereitstellen.
Option A: Jaeger all-in-one (Entwicklung/Tests):
# docker-compose.yml
version: '3.8'
services:
jaeger:
image: jaegertracing/all-in-one:1.51
ports:
- "5775:5775/udp" # Zipkin compact thrift
- "6831:6831/udp" # Jaeger compact thrift
- "6832:6832/udp" # Jaeger binary thrift
- "5778:5778" # Serve configs
- "16686:16686" # Jaeger UI
- "14268:14268" # Jaeger HTTP thrift
- "14250:14250" # Jaeger GRPC
- "9411:9411" # Zipkin compatible endpoint
environment:
- COLLECTOR_ZIPKIN_HOST_PORT=:9411
- COLLECTOR_OTLP_ENABLED=true
restart: unless-stopped
Option B: Grafana Tempo (Produktion, skalierbar):
# docker-compose.yml
version: '3.8'
services:
tempo:
image: grafana/tempo:2.3.0
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml
- tempo-data:/tmp/tempo
ports:
- "3200:3200" # Tempo HTTP
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "9411:9411" # Zipkin
restart: unless-stopped
volumes:
tempo-data:
Tempo-Konfiguration (tempo.yaml):
server:
http_listen_port: 3200
distributor:
receivers:
jaeger:
# ... (see EXAMPLES.md for complete configuration)
Fuer Produktion mit S3-Speicher:
storage:
trace:
backend: s3
s3:
bucket: tempo-traces
endpoint: s3.amazonaws.com
region: us-east-1
wal:
path: /tmp/tempo/wal
pool:
max_workers: 100
queue_depth: 10000
Erwartet: Tracing-Backend ist zugaenglich, bereit, Traces ueber OTLP zu empfangen, Jaeger-UI oder Grafana zeigt initial "keine Traces" an.
Bei Fehler:
netstat -tulpn | grep -E '(4317|16686|3200)'docker logs jaeger oder docker logs tempocurl http://localhost:4318/v1/traces -vtempo -config.file=/etc/tempo.yaml -verify-configOpenTelemetry-Auto-Instrumentierung fuer gaengige Frameworks verwenden, um Code-Aenderungen zu minimieren.
Python mit Flask:
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
# app.py
from flask import Flask
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# ... (see EXAMPLES.md for complete configuration)
Go mit Gin-Framework:
go get go.opentelemetry.io/otel
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
go get go.opentelemetry.io/otel/sdk/trace
go get go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin
package main
import (
"context"
"github.com/gin-gonic/gin"
"go.opentelemetry.io/otel"
# ... (see EXAMPLES.md for complete configuration)
Node.js mit Express:
npm install @opentelemetry/api \
@opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-grpc
// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
# ... (see EXAMPLES.md for complete configuration)
Erwartet: Traces von instrumentierten Services erscheinen in der Jaeger-UI oder in Grafana, HTTP-Anfragen erstellen automatisch Spans.
Bei Fehler:
OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317OTEL_LOG_LEVEL=debug (Python), OTEL_LOG_LEVEL=DEBUG (Node.js)Benutzerdefinierte Spans fuer Geschaeftslogik, Datenbankabfragen und externe Aufrufe erstellen.
Python manuelle Spans:
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def process_order(order_id):
# Create a span for the entire operation
# ... (see EXAMPLES.md for complete configuration)
Go manuelle Spans:
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/trace"
# ... (see EXAMPLES.md for complete configuration)
Best Practices fuer Span-Attribute:
http.method, http.status_code, db.system, db.statementuser.id, order.id, product.categoryinstance.id, region, availability_zonespan.RecordError(err) und span.SetStatus(codes.Error, message)span.AddEvent("cache_miss")Erwartet: Benutzerdefinierte Spans erscheinen in der Trace-Ansicht, Eltern-Kind-Beziehungen korrekt, Attribute in Span-Details sichtbar, Fehler hervorgehoben.
Bei Fehler:
defer span.End() in Go, with-Bloecke in Python)Sicherstellen, dass der Trace-Kontext ueber Service-Grenzen und asynchrone Operationen hinweg fliesst.
HTTP-Header-Propagierung (W3C Trace Context):
# Client side (Python with requests)
import requests
from opentelemetry import trace
from opentelemetry.propagate import inject
tracer = trace.get_tracer(__name__)
# ... (see EXAMPLES.md for complete configuration)
// Server side (Go with Gin)
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/propagation"
)
# ... (see EXAMPLES.md for complete configuration)
Nachrichtenwarteschlangen-Propagierung (Kafka):
# Producer
from opentelemetry.propagate import inject
from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers=['kafka:9092'])
# ... (see EXAMPLES.md for complete configuration)
# Consumer
from opentelemetry.propagate import extract
def process_message(msg):
# Extract trace context from Kafka headers
headers = {k: v.decode('utf-8') for k, v in msg.headers}
ctx = extract(headers)
# Continue the trace
with tracer.start_as_current_span("process_order_event", context=ctx):
order_id = json.loads(msg.value)['order_id']
handle_order(order_id)
Asynchrone Operationen (Python asyncio):
import asyncio
from opentelemetry import trace, context
async def async_operation():
# Capture current context
token = context.attach(context.get_current())
try:
with tracer.start_as_current_span("async_database_query"):
await asyncio.sleep(0.1) # Simulated async work
return "result"
finally:
context.detach(token)
Erwartet: Traces erstrecken sich ueber mehrere Services, Trace-IDs sind ueber Service-Grenzen hinweg konsistent, Eltern-Kind-Beziehungen bleiben erhalten.
Bei Fehler:
otel.propagation.set_global_textmap(TraceContextTextMapPropagator())traceparent-Headerwert protokollierenSampling implementieren, um das Trace-Volumen und die Kosten zu reduzieren und gleichzeitig die Sichtbarkeit zu erhalten.
Sampling-Strategien:
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import (
ParentBased,
TraceIdRatioBased,
StaticSampler,
Decision
# ... (see EXAMPLES.md for complete configuration)
Tail-basiertes Sampling mit Tempo:
In tempo.yaml konfigurieren:
overrides:
defaults:
metrics_generator:
processors: [service-graphs, span-metrics]
storage:
path: /tmp/tempo/generator/wal
remote_write:
- url: http://prometheus:9090/api/v1/write
send_exemplars: true
# Tail sampling (requires tempo-query)
ingestion_rate_limit_bytes: 5000000
ingestion_burst_size_bytes: 10000000
Grafana Tempo's TraceQL fuer dynamisches Sampling verwenden:
# Sample traces with errors
{ status = error }
# Sample slow traces (>1s)
{ duration > 1s }
# Sample specific services
{ resource.service.name = "checkout-service" }
Erwartet: Trace-Volumen auf Zielprozentsatz reduziert, Fehler-Traces werden immer gesampelt, Sampling-Entscheidung in Trace-Metadaten sichtbar.
Bei Fehler:
ingestion_burst_size_bytes)otel_traces_dropped_total-MetrikTraces mit Metriken und Logs fuer einheitliche Observability verknuepfen.
Trace-IDs zu Logs hinzufuegen (Python):
import logging
from opentelemetry import trace
# Custom log formatter with trace context
class TraceFormatter(logging.Formatter):
def format(self, record):
# ... (see EXAMPLES.md for complete configuration)
Metriken aus Traces generieren (Tempo):
# tempo.yaml
metrics_generator:
registry:
external_labels:
cluster: production
storage:
# ... (see EXAMPLES.md for complete configuration)
Dadurch werden Prometheus-Metriken generiert:
traces_service_graph_request_total - Anzahl der Anfragen zwischen Servicestraces_span_metrics_duration_seconds - Span-Dauer-Histogrammtraces_spanmetrics_calls_total - Span-AufrufzaehlerTraces aus Metriken abfragen (Grafana):
Exemplar-Unterstuetzung fuer Prometheus-Datenquelle in Grafana hinzufuegen:
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus:9090
jsonData:
exemplarTraceIdDestinations:
- name: trace_id
datasourceName: Tempo
In Grafana-Dashboard Exemplare aktivieren:
{
"fieldConfig": {
"defaults": {
"custom": {
"showExemplars": true
}
}
}
}
Erwartet: Das Klicken auf Metrik-Exemplare oeffnet den Trace, Logs zeigen Trace-IDs, Traces verlinken zu Logs, einheitliches Debugging ueber alle Signale hinweg.
Bei Fehler:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) and on() exemplarcontext an nachgelagerte Aufrufe weiterzugeben, unterbricht Traces. Kontext immer explizit weitergeben.defer span.End() (Go) oder with-Bloecke (Python) fuehrt dazu, dass Spans offen bleiben und Speicherlecks entstehen.span.RecordError() verliert wertvolle Debug-Informationen. Fehler in Spans immer aufzeichnen.ParentBased-Sampler verwenden, um Upstream-Sampling zu beruecksichtigen.correlate-observability-signals - Einheitliches Debugging mit Metriken, Logs und Traces, die durch Trace-IDs verknuepft sindsetup-prometheus-monitoring - Metriken aus Traces mit dem Tempo-Metrik-Generator generierenconfigure-log-aggregation - Trace-IDs zu Logs hinzufuegen zur Korrelation mit verteilten Tracesbuild-grafana-dashboards - Trace-abgeleitete Metriken und Exemplar-Links in Dashboards visualisierentesting
Launch all available agents in parallel waves for open-ended hypothesis generation on problems where the correct domain is unknown. Use when facing a cross-domain problem with no clear starting point, when single-agent approaches have stalled, or when diverse perspectives are more valuable than deep expertise. Produces a ranked hypothesis set with convergence analysis and adversarial refinement.
tools
Write integration tests for a Node.js CLI application using the built-in node:test module. Covers the exec helper pattern, output assertions, filesystem state verification, cleanup hooks, JSON output parsing, error case testing, and state restoration after destructive tests. Use when adding tests to an existing CLI, testing a new command, verifying adapter behavior across frameworks, or setting up CI for a CLI tool.
development
Screen a proposed trademark for conflicts and distinctiveness before filing. Covers trademark database searches (TMview, WIPO Global Brand Database, USPTO TESS), distinctiveness analysis using the Abercrombie spectrum, likelihood of confusion assessment using DuPont factors and EUIPO relative grounds, common law rights evaluation, and goods/services overlap analysis. Produces a conflict report with a risk matrix. Use before adopting a new brand name, logo, or slogan — distinct from patent prior art search, which uses different databases, legal frameworks, and analysis methods.
tools
Scaffold a new CLI command using Commander.js with options, action handler, three output modes (human-readable, quiet, JSON), and optional ceremony variant. Covers command naming, option design, shared context patterns, error handling, and integration testing. Use when adding a command to an existing Commander.js CLI, designing a new CLI tool from scratch, or standardizing command structure across a multi-command CLI.