plugins/faos-sre/skills/distributed-tracing/SKILL.md
<!-- AUTO-GENERATED by export-plugins.py — DO NOT EDIT --> --- name: distributed-tracing description: Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems. tags: [devops, distributed] --- # Distributed Tracing Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservice
npx skillsauth add frank-luongt/faos-skills-marketplace plugins/faos-sre/skills/distributed-tracingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Implement distributed tracing with Jaeger and Tempo for request flow visibility across microservices.
Track requests across distributed systems to understand latency, dependencies, and failure points.
Trace (Request ID: abc123)
↓
Span (frontend) [100ms]
↓
Span (api-gateway) [80ms]
├→ Span (auth-service) [10ms]
└→ Span (user-service) [60ms]
└→ Span (database) [40ms]
# Deploy Jaeger Operator
kubectl create namespace observability
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.51.0/jaeger-operator.yaml -n observability
# Deploy Jaeger instance
kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
namespace: observability
spec:
strategy: production
storage:
type: elasticsearch
options:
es:
server-urls: http://elasticsearch:9200
ingress:
enabled: true
EOF
version: '3.8'
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "5775:5775/udp"
- "6831:6831/udp"
- "6832:6832/udp"
- "5778:5778"
- "16686:16686" # UI
- "14268:14268" # Collector
- "14250:14250" # gRPC
- "9411:9411" # Zipkin
environment:
- COLLECTOR_ZIPKIN_HOST_PORT=:9411
Reference: See references/jaeger-setup.md
from opentelemetry import trace
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from flask import Flask
# Initialize tracer
resource = Resource(attributes={SERVICE_NAME: "my-service"})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(JaegerExporter(
agent_host_name="jaeger",
agent_port=6831,
))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
# Instrument Flask
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
@app.route('/api/users')
def get_users():
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("get_users") as span:
span.set_attribute("user.count", 100)
# Business logic
users = fetch_users_from_db()
return {"users": users}
def fetch_users_from_db():
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("database_query") as span:
span.set_attribute("db.system", "postgresql")
span.set_attribute("db.statement", "SELECT * FROM users")
# Database query
return query_database()
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');
// Initialize tracer
const provider = new NodeTracerProvider({
resource: { attributes: { 'service.name': 'my-service' } }
});
const exporter = new JaegerExporter({
endpoint: 'http://jaeger:14268/api/traces'
});
provider.addSpanProcessor(new BatchSpanProcessor(exporter));
provider.register();
// Instrument libraries
registerInstrumentations({
instrumentations: [
new HttpInstrumentation(),
new ExpressInstrumentation(),
],
});
const express = require('express');
const app = express();
app.get('/api/users', async (req, res) => {
const tracer = trace.getTracer('my-service');
const span = tracer.startSpan('get_users');
try {
const users = await fetchUsers();
span.setAttributes({ 'user.count': users.length });
res.json({ users });
} finally {
span.end();
}
});
package main
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/jaeger"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
)
func initTracer() (*sdktrace.TracerProvider, error) {
exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
jaeger.WithEndpoint("http://jaeger:14268/api/traces"),
))
if err != nil {
return nil, err
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("my-service"),
)),
)
otel.SetTracerProvider(tp)
return tp, nil
}
func getUsers(ctx context.Context) ([]User, error) {
tracer := otel.Tracer("my-service")
ctx, span := tracer.Start(ctx, "get_users")
defer span.End()
span.SetAttributes(attribute.String("user.filter", "active"))
users, err := fetchUsersFromDB(ctx)
if err != nil {
span.RecordError(err)
return nil, err
}
span.SetAttributes(attribute.Int("user.count", len(users)))
return users, nil
}
Reference: See references/instrumentation.md
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: congo=t61rcWkgMzE
from opentelemetry.propagate import inject
headers = {}
inject(headers) # Injects trace context
response = requests.get('http://downstream-service/api', headers=headers)
const { propagation } = require('@opentelemetry/api');
const headers = {};
propagation.inject(context.active(), headers);
axios.get('http://downstream-service/api', { headers });
apiVersion: v1
kind: ConfigMap
metadata:
name: tempo-config
data:
tempo.yaml: |
server:
http_listen_port: 3200
distributor:
receivers:
jaeger:
protocols:
thrift_http:
grpc:
otlp:
protocols:
http:
grpc:
storage:
trace:
backend: s3
s3:
bucket: tempo-traces
endpoint: s3.amazonaws.com
querier:
frontend_worker:
frontend_address: tempo-query-frontend:9095
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: tempo
spec:
replicas: 1
template:
spec:
containers:
- name: tempo
image: grafana/tempo:latest
args:
- -config.file=/etc/tempo/tempo.yaml
volumeMounts:
- name: config
mountPath: /etc/tempo
volumes:
- name: config
configMap:
name: tempo-config
Reference: See assets/jaeger-config.yaml.template
# Sample 1% of traces
sampler:
type: probabilistic
param: 0.01
# Sample max 100 traces per second
sampler:
type: ratelimiting
param: 100
from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
# Sample based on trace ID (deterministic)
sampler = ParentBased(root=TraceIdRatioBased(0.01))
Jaeger Query:
service=my-service
duration > 1s
Jaeger Query:
service=my-service
error=true
tags.http.status_code >= 500
Jaeger automatically generates service dependency graphs showing:
import logging
from opentelemetry import trace
logger = logging.getLogger(__name__)
def process_request():
span = trace.get_current_span()
trace_id = span.get_span_context().trace_id
logger.info(
"Processing request",
extra={"trace_id": format(trace_id, '032x')}
)
No traces appearing:
High latency overhead:
references/jaeger-setup.md - Jaeger installationreferences/instrumentation.md - Instrumentation patternsassets/jaeger-config.yaml.template - Jaeger configurationprometheus-configuration - For metricsgrafana-dashboards - For visualizationslo-implementation - For latency SLOsdevelopment
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-mlflow-evaluation --- # MLflow 3 GenAI Evaluation ## Before Writing Any Code 1. **Read GOTCHAS.md** - 15+ common mistakes that cause failures 2. **Read CRITICAL-interfaces.md** - Exact API signatures and data schemas ## End-to-End Workflows Follow these workflows based on your goal. Each step indicates which reference files to read. ### Workflow 1: First-Time Evaluation Setup For users new to MLflow GenAI evalu
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-lakebase-provisioned --- # Lakebase Provisioned Patterns and best practices for using Lakebase Provisioned (Databricks managed PostgreSQL) for OLTP workloads. ## When to Use Use this skill when: - Building applications that need a PostgreSQL database for transactional workloads - Adding persistent state to Databricks Apps - Implementing reverse ETL from Delta Lake to an operational database - Storing chat/agent m
tools
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-jobs --- # Databricks Lakeflow Jobs ## Overview Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles. ## Reference Files | Use Case | Reference File | | ----------------------
development
<!-- AUTO-GENERATED by export-skills.py — DO NOT EDIT --> --- name: databricks-genie --- # Databricks Genie Create and query Databricks Genie Spaces - natural language interfaces for SQL-based data exploration. ## Overview Genie Spaces allow users to ask natural language questions about structured data in Unity Catalog. The system translates questions into SQL queries, executes them on a SQL warehouse, and presents results conversationally. ## When to Use This Skill Use this skill when: -