Grafana Tempo Skill

Comprehensive guide for Grafana Tempo - the cost-effective, high-scale distributed tracing backend designed for OpenTelemetry.

What is Tempo?

Tempo is a high-scale distributed tracing backend that:

Trace-ID lookup model - No indexing of every attribute, keeps ingestion fast and storage costs low
OpenTelemetry native - First-class support for OTLP protocol
Object storage backed - Stores traces in affordable S3, GCS, or Azure Blob Storage
TraceQL query language - Powerful query language inspired by PromQL and LogQL
Apache Parquet format - 5-10x less data pulled per query vs legacy formats
Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header

Architecture Overview

Core Components

| Component | Purpose | |-----------|---------| | Distributor | Entry point for trace data, routes to ingesters via consistent hash ring | | Ingester | Buffers traces in memory, creates Parquet blocks, flushes to storage | | Query Frontend | Query orchestration, shards blockID space, coordinates queriers | | Querier | Locates traces in ingesters or storage using bloom filters | | Compactor | Compresses blocks, deduplicates data, manages retention | | Metrics Generator | Optional: derives metrics from traces |

Data Flow

Write Path:

Applications → Collector → Distributor → Ingester → Object Storage
                                  ↓
                           Consistent Hash Ring
                           (routes by traceID)

Read Path:

Query Request → Query Frontend → Queriers → Ingesters (recent data)
                      ↓                            ↓
                 Block Sharding          Object Storage (historical data)
                      ↓                            ↓
              Parallel Querier Work      Bloom Filters + Indexes

Deployment Modes

1. Monolithic Mode (`-target=all`)

All components in single process
Best for: Local testing, small-scale deployments
Cannot horizontally scale component count
Scale by increasing replicas

2. Scalable Monolithic (`-target=scalable-single-binary`)

All components in one process with horizontal scaling
Each instance runs all components
Good for development with scaling needs

3. Microservices Mode (Distributed) - Recommended for Production

# Using tempo-distributed Helm chart
distributor:
  replicas: 3

ingester:
  replicas: 3

querier:
  replicas: 2

queryFrontend:
  replicas: 2

compactor:
  replicas: 1

Helm Deployment

Add Repository

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install Distributed Tempo

helm install tempo grafana/tempo-distributed \
  --namespace monitoring \
  --values values.yaml

Production Values Example

# Storage configuration
storage:
  trace:
    backend: azure  # or s3, gcs
    azure:
      container_name: tempo-traces
      storage_account_name: mystorageaccount
      use_federated_token: true  # Workload Identity

# Distributor
distributor:
  replicas: 3
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      memory: 4Gi

# Ingester
ingester:
  replicas: 3
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
    limits:
      memory: 8Gi  # Spikes to 8GB periodically
  persistence:
    enabled: true
    size: 20Gi

# Querier
querier:
  replicas: 2
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      memory: 4Gi

# Query Frontend
queryFrontend:
  replicas: 2
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      memory: 2Gi

# Compactor
compactor:
  replicas: 1
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      memory: 6Gi

# Block retention
compactor:
  compaction:
    block_retention: 336h  # 14 days

# Gateway for external access
gateway:
  enabled: true
  replicas: 1

# Metrics Generator (optional)
metricsGenerator:
  enabled: false

Storage Configuration

Azure Blob Storage (Recommended for Azure)

storage:
  trace:
    backend: azure
    azure:
      container_name: tempo-traces
      storage_account_name: <storage-account-name>
      # Option 1: Workload Identity (Recommended)
      use_federated_token: true
      # Option 2: User-Assigned Managed Identity
      use_managed_identity: true
      user_assigned_id: <identity-client-id>
      # Option 3: Account Key (Dev only)
      # storage_account_key: <account-key>
      endpoint_suffix: blob.core.windows.net
      hedge_requests_at: 400ms
      hedge_requests_up_to: 2

AWS S3

storage:
  trace:
    backend: s3
    s3:
      bucket: my-tempo-bucket
      region: us-east-1
      endpoint: s3.us-east-1.amazonaws.com
      # Use IAM roles or access keys
      access_key: <access-key>
      secret_key: <secret-key>

Google Cloud Storage

storage:
  trace:
    backend: gcs
    gcs:
      bucket_name: my-tempo-bucket
      # Uses Workload Identity or service account

TraceQL Query Language

Basic Queries

# Simplest query - all spans
{ }

# Filter by service
{ resource.service.name = "frontend" }

# Filter by operation
{ span:name = "GET /api/orders" }

# Filter by status
{ span:status = error }

# Filter by duration
{ span:duration > 500ms }

# Multiple conditions
{ resource.service.name = "api" && span:status = error }

Structural Operators

# Direct parent-child relationship
{ resource.service.name = "frontend" } > { resource.service.name = "api" }

# Ancestor-descendant relationship
{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }

# Sibling relationship
{ span:name = "span-a" } ~ { span:name = "span-b" }

Aggregation Functions

# Count spans
{ } | count() > 10

# Average duration
{ } | avg(span:duration) > 20ms

# Max duration
{ span:status = error } | max(span:duration)

Metrics Functions

# Rate of errors
{ span:status = error } | rate()

# Count over time
{ span:name = "GET /:endpoint" } | count_over_time()

# Percentile latency
{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)

# Group by service
{ span:status = error } | rate() by(resource.service.name)

# Top 10 by error rate
{ span:status = error } | rate() by(resource.service.name) | topk(10)

Trace Structure

Intrinsic Fields (colon separator)

| Field | Description | |-------|-------------| | span:name | Operation name | | span:duration | Elapsed time (e.g., "10ms", "1.5s") | | span:status | ok, error, or unset | | span:kind | server, client, producer, consumer, internal | | trace:duration | Total trace duration | | trace:rootName | Root span name | | trace:rootService | Root span service |

Attribute Scopes (period separator)

| Scope | Example | Description | |-------|---------|-------------| | span. | span.http.method | Span-level attributes | | resource. | resource.service.name | Resource attributes | | event. | event.exception.message | Event attributes | | link. | link.traceID | Link attributes |

Receiver Endpoints

| Protocol | Port | Endpoint | |----------|------|----------| | OTLP gRPC | 4317 | /v1/traces | | OTLP HTTP | 4318 | /v1/traces | | Jaeger gRPC | 14250 | - | | Jaeger Thrift HTTP | 14268 | /api/traces | | Jaeger Thrift Compact | 6831 | UDP | | Jaeger Thrift Binary | 6832 | UDP | | Zipkin | 9411 | /api/v2/spans |

Multi-Tenancy

# Enable multi-tenancy
multitenancy_enabled: true

# All requests must include X-Scope-OrgID header
# Example:
# curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>

Azure Identity Configuration

Workload Identity Federation (Recommended)

1. Enable Workload Identity on AKS:

az aks update \
  --name <aks-cluster> \
  --resource-group <rg> \
  --enable-oidc-issuer \
  --enable-workload-identity

2. Create User-Assigned Managed Identity:

az identity create \
  --name tempo-identity \
  --resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)

3. Assign Storage Permission:

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id <principal-id> \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

4. Create Federated Credential:

az identity federated-credential create \
  --name tempo-federated \
  --identity-name tempo-identity \
  --resource-group <rg> \
  --issuer <aks-oidc-issuer-url> \
  --subject system:serviceaccount:monitoring:tempo \
  --audiences api://AzureADTokenExchange

5. Configure Helm Values:

serviceAccount:
  annotations:
    azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
  azure.workload.identity/use: "true"

storage:
  trace:
    azure:
      use_federated_token: true

Troubleshooting

Common Issues

1. Container Not Found (Azure)

az storage container create --name tempo-traces --account-name <storage>

2. Authorization Failure (Azure)

# Verify RBAC assignment
az role assignment list --scope <storage-scope>

# Assign if missing
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id <principal-id> \
  --scope <storage-scope>

3. Ingester OOM

ingester:
  resources:
    limits:
      memory: 16Gi  # Increase from 8Gi

4. Query Timeout

querier:
  query_timeout: 5m
  max_concurrent_queries: 20

Diagnostic Commands

# Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo

# Check distributor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100

# Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100

# Verify readiness
kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready

# Check ring status
kubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring
curl http://localhost:3200/distributor/ring

API Reference

Trace Retrieval

# Get trace by ID
GET /api/traces/<traceID>

# Search traces (TraceQL)
GET /api/search?q={resource.service.name="api"}

# Search tags
GET /api/search/tags
GET /api/search/tag/<tag>/values

Health

GET /ready
GET /metrics

Reference Documentation

For detailed configuration by topic:

Storage Configuration: Object stores, retention, caching
TraceQL Reference: Query syntax and examples
Configuration Reference: Full configuration manifest

External Resources

Official Tempo Documentation
Tempo Helm Chart
TraceQL Documentation
Tempo GitHub Repository

Gotchas

Retention is per-tenant; global retention is fallback only — a misconfigured tenant silently overrides.
Tail sampling at end of OTel collector pipeline doesn't release the trace from inflight buffer — under load the collector OOMs with traces it's about to drop.
TraceQL {} expressions are filter expressions, not query strings — semantics differ from PromQL/LogQL; copy-pasting query patterns fails subtly.
Multi-tenant via X-Scope-OrgID: missing/invalid header falls into "fake" tenant, same as Loki/Mimir.
Span size cap (default 1 MB): spans larger than the cap are silently truncated mid-attribute — debug by checking ingester metrics for discarded_spans_total.
Search by service.name vs resource.service.name: depends on whether your OTel SDK puts the attribute on the span or on the resource — both look similar but indexed differently.

Grafana Tempo Skill

Comprehensive guide for Grafana Tempo - the cost-effective, high-scale distributed tracing backend designed for OpenTelemetry.

What is Tempo?

Tempo is a high-scale distributed tracing backend that:

Trace-ID lookup model - No indexing of every attribute, keeps ingestion fast and storage costs low
OpenTelemetry native - First-class support for OTLP protocol
Object storage backed - Stores traces in affordable S3, GCS, or Azure Blob Storage
TraceQL query language - Powerful query language inspired by PromQL and LogQL
Apache Parquet format - 5-10x less data pulled per query vs legacy formats
Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header

Architecture Overview

Core Components

Data Flow

Write Path:

Applications → Collector → Distributor → Ingester → Object Storage
                                  ↓
                           Consistent Hash Ring
                           (routes by traceID)

Read Path:

Query Request → Query Frontend → Queriers → Ingesters (recent data)
                      ↓                            ↓
                 Block Sharding          Object Storage (historical data)
                      ↓                            ↓
              Parallel Querier Work      Bloom Filters + Indexes

Deployment Modes

1. Monolithic Mode (`-target=all`)

All components in single process
Best for: Local testing, small-scale deployments
Cannot horizontally scale component count
Scale by increasing replicas

2. Scalable Monolithic (`-target=scalable-single-binary`)

All components in one process with horizontal scaling
Each instance runs all components
Good for development with scaling needs

3. Microservices Mode (Distributed) - Recommended for Production

# Using tempo-distributed Helm chart
distributor:
  replicas: 3

ingester:
  replicas: 3

querier:
  replicas: 2

queryFrontend:
  replicas: 2

compactor:
  replicas: 1

Helm Deployment

Add Repository

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install Distributed Tempo

helm install tempo grafana/tempo-distributed \
  --namespace monitoring \
  --values values.yaml

Production Values Example

# Storage configuration
storage:
  trace:
    backend: azure  # or s3, gcs
    azure:
      container_name: tempo-traces
      storage_account_name: mystorageaccount
      use_federated_token: true  # Workload Identity

# Distributor
distributor:
  replicas: 3
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      memory: 4Gi

# Ingester
ingester:
  replicas: 3
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
    limits:
      memory: 8Gi  # Spikes to 8GB periodically
  persistence:
    enabled: true
    size: 20Gi

# Querier
querier:
  replicas: 2
  resources:
    requests:
      cpu: 100m
      memory: 256Mi
    limits:
      memory: 4Gi

# Query Frontend
queryFrontend:
  replicas: 2
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      memory: 2Gi

# Compactor
compactor:
  replicas: 1
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      memory: 6Gi

# Block retention
compactor:
  compaction:
    block_retention: 336h  # 14 days

# Gateway for external access
gateway:
  enabled: true
  replicas: 1

# Metrics Generator (optional)
metricsGenerator:
  enabled: false

Storage Configuration

Azure Blob Storage (Recommended for Azure)

storage:
  trace:
    backend: azure
    azure:
      container_name: tempo-traces
      storage_account_name: <storage-account-name>
      # Option 1: Workload Identity (Recommended)
      use_federated_token: true
      # Option 2: User-Assigned Managed Identity
      use_managed_identity: true
      user_assigned_id: <identity-client-id>
      # Option 3: Account Key (Dev only)
      # storage_account_key: <account-key>
      endpoint_suffix: blob.core.windows.net
      hedge_requests_at: 400ms
      hedge_requests_up_to: 2

AWS S3

storage:
  trace:
    backend: s3
    s3:
      bucket: my-tempo-bucket
      region: us-east-1
      endpoint: s3.us-east-1.amazonaws.com
      # Use IAM roles or access keys
      access_key: <access-key>
      secret_key: <secret-key>

Google Cloud Storage

storage:
  trace:
    backend: gcs
    gcs:
      bucket_name: my-tempo-bucket
      # Uses Workload Identity or service account

TraceQL Query Language

Basic Queries

# Simplest query - all spans
{ }

# Filter by service
{ resource.service.name = "frontend" }

# Filter by operation
{ span:name = "GET /api/orders" }

# Filter by status
{ span:status = error }

# Filter by duration
{ span:duration > 500ms }

# Multiple conditions
{ resource.service.name = "api" && span:status = error }

Structural Operators

# Direct parent-child relationship
{ resource.service.name = "frontend" } > { resource.service.name = "api" }

# Ancestor-descendant relationship
{ span:name = "GET /api/products" } >> { span.db.system = "postgresql" }

# Sibling relationship
{ span:name = "span-a" } ~ { span:name = "span-b" }

Aggregation Functions

# Count spans
{ } | count() > 10

# Average duration
{ } | avg(span:duration) > 20ms

# Max duration
{ span:status = error } | max(span:duration)

Metrics Functions

# Rate of errors
{ span:status = error } | rate()

# Count over time
{ span:name = "GET /:endpoint" } | count_over_time()

# Percentile latency
{ span:name = "GET /:endpoint" } | quantile_over_time(span:duration, .99)

# Group by service
{ span:status = error } | rate() by(resource.service.name)

# Top 10 by error rate
{ span:status = error } | rate() by(resource.service.name) | topk(10)

Trace Structure

Intrinsic Fields (colon separator)

Attribute Scopes (period separator)

Receiver Endpoints

Multi-Tenancy

# Enable multi-tenancy
multitenancy_enabled: true

# All requests must include X-Scope-OrgID header
# Example:
# curl -H "X-Scope-OrgID: tenant-1" http://tempo:3200/api/traces/<traceID>

Azure Identity Configuration

Workload Identity Federation (Recommended)

1. Enable Workload Identity on AKS:

az aks update \
  --name <aks-cluster> \
  --resource-group <rg> \
  --enable-oidc-issuer \
  --enable-workload-identity

2. Create User-Assigned Managed Identity:

az identity create \
  --name tempo-identity \
  --resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name tempo-identity --resource-group <rg> --query clientId -o tsv)

3. Assign Storage Permission:

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id <principal-id> \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

4. Create Federated Credential:

az identity federated-credential create \
  --name tempo-federated \
  --identity-name tempo-identity \
  --resource-group <rg> \
  --issuer <aks-oidc-issuer-url> \
  --subject system:serviceaccount:monitoring:tempo \
  --audiences api://AzureADTokenExchange

5. Configure Helm Values:

serviceAccount:
  annotations:
    azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
  azure.workload.identity/use: "true"

storage:
  trace:
    azure:
      use_federated_token: true

Troubleshooting

Common Issues

1. Container Not Found (Azure)

az storage container create --name tempo-traces --account-name <storage>

2. Authorization Failure (Azure)

# Verify RBAC assignment
az role assignment list --scope <storage-scope>

# Assign if missing
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id <principal-id> \
  --scope <storage-scope>

3. Ingester OOM

ingester:
  resources:
    limits:
      memory: 16Gi  # Increase from 8Gi

4. Query Timeout

querier:
  query_timeout: 5m
  max_concurrent_queries: 20

Diagnostic Commands

# Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=tempo

# Check distributor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100

# Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100

# Verify readiness
kubectl exec -it <tempo-pod> -n monitoring -- wget -qO- http://localhost:3200/ready

# Check ring status
kubectl port-forward svc/tempo-distributor 3200:3200 -n monitoring
curl http://localhost:3200/distributor/ring

API Reference

Trace Retrieval

# Get trace by ID
GET /api/traces/<traceID>

# Search traces (TraceQL)
GET /api/search?q={resource.service.name="api"}

# Search tags
GET /api/search/tags
GET /api/search/tag/<tag>/values

Health

GET /ready
GET /metrics

Reference Documentation

For detailed configuration by topic:

Storage Configuration: Object stores, retention, caching
TraceQL Reference: Query syntax and examples
Configuration Reference: Full configuration manifest

External Resources

Official Tempo Documentation
Tempo Helm Chart
TraceQL Documentation
Tempo GitHub Repository

Gotchas

Retention is per-tenant; global retention is fallback only — a misconfigured tenant silently overrides.
Tail sampling at end of OTel collector pipeline doesn't release the trace from inflight buffer — under load the collector OOMs with traces it's about to drop.
TraceQL {} expressions are filter expressions, not query strings — semantics differ from PromQL/LogQL; copy-pasting query patterns fails subtly.
Multi-tenant via X-Scope-OrgID: missing/invalid header falls into "fake" tenant, same as Loki/Mimir.
Span size cap (default 1 MB): spans larger than the cap are silently truncated mid-attribute — debug by checking ingester metrics for discarded_spans_total.
Search by service.name vs resource.service.name: depends on whether your OTel SDK puts the attribute on the span or on the resource — both look similar but indexed differently.

Adoption

julianobarbosa/tempo

$ install --global

Security Scan Results

SKILL.md

Grafana Tempo Skill

What is Tempo?

Architecture Overview

Core Components

Data Flow

Deployment Modes

1. Monolithic Mode (-target=all)

2. Scalable Monolithic (-target=scalable-single-binary)

3. Microservices Mode (Distributed) - Recommended for Production

Helm Deployment

Add Repository

Install Distributed Tempo

Production Values Example

Storage Configuration

Azure Blob Storage (Recommended for Azure)

AWS S3

Google Cloud Storage

TraceQL Query Language

Basic Queries

Structural Operators

Aggregation Functions

Metrics Functions

Trace Structure

Intrinsic Fields (colon separator)

Attribute Scopes (period separator)

Receiver Endpoints

Multi-Tenancy

Azure Identity Configuration

Workload Identity Federation (Recommended)

Troubleshooting

Common Issues

Diagnostic Commands

API Reference

Trace Retrieval

Health

Reference Documentation

External Resources

Gotchas

Related Skills

julianobarbosa/your-skill-name

julianobarbosa/zsh-path

julianobarbosa/zabbix-api

julianobarbosa/yt-music

julianobarbosa/tempo

$ install --global

Security Scan Results

SKILL.md

Grafana Tempo Skill

What is Tempo?

Architecture Overview

Core Components

Data Flow

Deployment Modes

1. Monolithic Mode (-target=all)

2. Scalable Monolithic (-target=scalable-single-binary)

3. Microservices Mode (Distributed) - Recommended for Production

Helm Deployment

Add Repository

Install Distributed Tempo

Production Values Example

Storage Configuration

Azure Blob Storage (Recommended for Azure)

AWS S3

Google Cloud Storage

TraceQL Query Language

Basic Queries

Structural Operators

Aggregation Functions

Metrics Functions

Trace Structure

Intrinsic Fields (colon separator)

Attribute Scopes (period separator)

Receiver Endpoints

Multi-Tenancy

Azure Identity Configuration

1. Monolithic Mode (`-target=all`)

2. Scalable Monolithic (`-target=scalable-single-binary`)

1. Monolithic Mode (`-target=all`)

2. Scalable Monolithic (`-target=scalable-single-binary`)