skills/tempo/SKILL.md
Guide for implementing Grafana Tempo - a high-scale distributed tracing backend for OpenTelemetry traces. Use when configuring Tempo deployments, setting up storage backends (S3, Azure Blob, GCS), writing TraceQL queries, deploying via Helm, understanding trace structure, or troubleshooting Tempo issues on Kubernetes.
npx skillsauth add julianobarbosa/claude-code-skills tempoInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive guide for Grafana Tempo - the cost-effective, high-scale distributed tracing backend designed for OpenTelemetry.
Tempo is a high-scale distributed tracing backend that:
X-Scope-OrgID header| Component | Purpose | |-----------|---------| | Distributor | Entry point for trace data, routes to ingesters via consistent hash ring | | Ingester | Buffers traces in memory, creates Parquet blocks, flushes to storage | | Query Frontend | Query orchestration, shards blockID space, coordinates queriers | | Querier | Locates traces in ingesters or storage using bloom filters | | Compactor | Compresses blocks, deduplicates data, manages retention | | Metrics Generator | Optional: derives metrics from traces |
Write Path:
Applications → Collector → Distributor → Ingester → Object Storage
↓
Consistent Hash Ring
(routes by traceID)
Read Path:
Query Request → Query Frontend → Queriers → Ingesters (recent data)
↓ ↓
Block Sharding Object Storage (historical data)
↓ ↓
Parallel Querier Work Bloom Filters + Indexes
-target=all)-target=scalable-single-binary)# Using tempo-distributed Helm chart
distributor:
replicas: 3
ingester:
replicas: 3
querier:
replicas: 2
queryFrontend:
replicas: 2
compactor:
replicas: 1
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install tempo grafana/tempo-distributed \
--namespace monitoring \
--values values.yaml
# Storage configuration
storage:
trace:
backend: azure # or s3, gcs
azure:
container_name: tempo-traces
storage_account_name: mystorageaccount
use_federated_token: true # Workload Identity
# Distributor
distributor:
replicas: 3
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
memory: 4Gi
# Ingester
ingester:
replicas: 3
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
memory: 8Gi # Spikes to 8GB periodically
persistence:
enabled: true
size: 20Gi
# Querier
querier:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 4Gi
# Query Frontend
queryFrontend:
replicas: 2
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
memory: 2Gi
# Compactor
compactor:
replicas: 1
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
memory: 6Gi
# Block retention
compactor:
compaction:
block_retention: 336h # 14 days
# Gateway for external access
gateway:
enabled: true
replicas: 1
# Metrics Generator (optional)
metricsGenerator:
enabled: false
storage:
trace:
backend: azure
azure:
container_name: tempo-traces
storage_account_name: <storage-account-name>
# Option 1: Workload Identity (Recommended)
use_federated_token: true
# Option 2: User-Assigned Managed Identity
use_managed_identity: true
user_assigned_id: <identity-client-id>
# Option 3: Account Key (Dev only)
# storage_account_key: <account-key>
endpoint_suffix: blob.core.windows.net
hedge_requests_at: 400ms
hedge_requests_up_to: 2
storage:
trace:
backend: s3
s3:
bucket: my-tempo-bucket
region: us-east-1
endpoint: s3.us-east-1.amazonaws.com
# Use IAM roles or access keys
access_key: <access-key>
secret_key: <secret-key>
storage:
trace:
backend: gcs
gcs:
bucket_name: my-tempo-bucket
# Uses Workload Identity or service account
# Simplest query - all spans
{ }
# Filter by service
{ resource.service.name = "frontend" }
# Filter by operation
{ span:name = "GET /api/orders" }
# Filter by status
{ span:status = error }
# Filter by duration
{ span:duration > 500ms }
# Multiple conditions
{ resource.service.name = "api" && span:status = error }
# Direct parent-child relationship
{ resource.service.name = "frontend" } > { resource.service.name = "api" }
# Ancestor-descendant relationship
{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }
# Sibling relationship
{ span:name = "span-a" } ~ { span:name = "span-b" }
# Count spans
{ } | count() > 10
# Average duration
{ } | avg(span:duration) > 20ms
# Max duration
{ span:status = error } | max(span:duration)
# Rate of errors
{ span:status = error } | rate()
# Count over time
{ span:name = "GET /:endpoint" } | count_over_time()
# Percentile latency
{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)
# Group by service
{ span:status = error } | rate() by(resource.service.name)
# Top 10 by error rate
{ span:status = error } | rate() by(resource.service.name) | topk(10)
| Field | Description |
|-------|-------------|
| span:name | Operation name |
| span:duration | Elapsed time (e.g., "10ms", "1.5s") |
| span:status | ok, error, or unset |
| span:kind | server, client, producer, consumer, internal |
| trace:duration | Total trace duration |
| trace:rootName | Root span name |
| trace:rootService | Root span service |
| Scope | Example | Description |
|-------|---------|-------------|
| span. | span.http.method | Span-level attributes |
| resource. | resource.service.name | Resource attributes |
| event. | event.exception.message | Event attributes |
| link. | link.traceID | Link attributes |
| Protocol | Port | Endpoint |
|----------|------|----------|
| OTLP gRPC | 4317 | /v1/traces |
| OTLP HTTP | 4318 | /v1/traces |
| Jaeger gRPC | 14250 | - |
| Jaeger Thrift HTTP | 14268 | /api/traces |
| Jaeger Thrift Compact | 6831 | UDP |
| Jaeger Thrift Binary | 6832 | UDP |
| Zipkin | 9411 | /api/v2/spans |
# Enable multi-tenancy
multitenancy_enabled: true
# All requests must include X-Scope-OrgID header
# Example:
# curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>
1. Enable Workload Identity on AKS:
az aks update \
--name <aks-cluster> \
--resource-group <rg> \
--enable-oidc-issuer \
--enable-workload-identity
2. Create User-Assigned Managed Identity:
az identity create \
--name tempo-identity \
--resource-group <rg>
IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)
3. Assign Storage Permission:
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id <principal-id> \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
4. Create Federated Credential:
az identity federated-credential create \
--name tempo-federated \
--identity-name tempo-identity \
--resource-group <rg> \
--issuer <aks-oidc-issuer-url> \
--subject system:serviceaccount:monitoring:tempo \
--audiences api://AzureADTokenExchange
5. Configure Helm Values:
serviceAccount:
annotations:
azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>
podLabels:
azure.workload.identity/use: "true"
storage:
trace:
azure:
use_federated_token: true
1. Container Not Found (Azure)
az storage container create --name tempo-traces --account-name <storage>
2. Authorization Failure (Azure)
# Verify RBAC assignment
az role assignment list --scope <storage-scope>
# Assign if missing
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id <principal-id> \
--scope <storage-scope>
3. Ingester OOM
ingester:
resources:
limits:
memory: 16Gi # Increase from 8Gi
4. Query Timeout
querier:
query_timeout: 5m
max_concurrent_queries: 20
# Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo
# Check distributor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100
# Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
# Verify readiness
kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready
# Check ring status
kubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring
curl http://localhost:3200/distributor/ring
# Get trace by ID
GET /api/traces/<traceID>
# Search traces (TraceQL)
GET /api/search?q={resource.service.name="api"}
# Search tags
GET /api/search/tags
GET /api/search/tag/<tag>/values
GET /ready
GET /metrics
For detailed configuration by topic:
{} expressions are filter expressions, not query strings — semantics differ from PromQL/LogQL; copy-pasting query patterns fails subtly.discarded_spans_total.development
End-to-end branch delivery: commit (no AI attribution) → push → open a pull request → ensure a Board work item exists (create one per task, assigned to the configured user, if none) and link it → after merge, clean up branch and worktree. Auto-detects the platform from the remote — Azure Repos + Boards (azure-devops-node-api SDK; OAuth Bearer push fallback via `az`) or GitHub (Octokit; `gh` for auth). Scripts are TypeScript, run via `bun`. Use whenever asked to "ship", "ship it", "ship this branch", "open a PR", "push and open a PR", "raise a PR", "deliver this", "send this for review", or "create a PR and link the work item" — and when a direct push to main is blocked and the change needs to go through a PR instead.
testing
Brief description of what this skill does. Include specific triggers - when should Claude use this skill? Example triggers, file types, or keywords that indicate this skill applies.
tools
Manage and troubleshoot PATH configuration in zsh. Use when adding tools to PATH (bun, nvm, Python venv, cargo, go), diagnosing "command not found" errors, validating PATH entries, or organizing shell configuration in .zshrc and .zshrc.local files.
tools
Zabbix monitoring system automation via API and Python. Use when: (1) Managing hosts, templates, items, triggers, or host groups, (2) Automating monitoring configuration, (3) Sending data via Zabbix trapper/sender, (4) Querying historical data or events, (5) Bulk operations on Zabbix objects, (6) Maintenance window management, (7) User/permission management