Grafana Mimir Skill

Comprehensive guide for Grafana Mimir - the horizontally scalable, highly available, multi-tenant time series database for long-term Prometheus metrics storage.

What is Mimir?

Mimir is an open-source, horizontally scalable, highly available, multi-tenant long-term storage solution for Prometheus and OpenTelemetry metrics that:

Overcomes Prometheus limitations - Scalability and long-term retention
Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header
Stores data in object storage - S3, GCS, Azure Blob Storage, or Swift
100% Prometheus compatible - PromQL queries, remote write protocol
Part of LGTM+ Stack - Logs, Grafana, Traces, Metrics unified observability

Architecture Overview

Core Components

| Component | Purpose | |-----------|---------| | Distributor | Validates requests, routes incoming metrics to ingesters via hash ring | | Ingester | Stores time-series data in memory, flushes to object storage | | Querier | Executes PromQL queries from ingesters and store-gateways | | Query Frontend | Caches query results, optimizes and splits queries | | Query Scheduler | Manages per-tenant query queues for fairness | | Store-Gateway | Provides access to historical metric blocks in object storage | | Compactor | Consolidates and optimizes stored metric data blocks | | Ruler | Evaluates recording and alerting rules (optional) | | Alertmanager | Handles alert routing and deduplication (optional) |

Data Flow

Write Path:

Prometheus/OTel → Distributor → Ingester → Object Storage
                       ↓
                 Hash Ring
                 (routes by series)

Read Path:

Query → Query Frontend → Query Scheduler → Querier
                                              ↓
                                    Ingesters (recent)
                                              ↓
                                    Store-Gateway (historical)

Deployment Modes

1. Monolithic Mode (`-target=all`)

All components in single process
Best for: Development, testing, small-scale (~1M series)
Horizontally scalable by deploying multiple instances
Not recommended for large-scale (all components scale together)

2. Microservices Mode (Distributed) - Recommended for Production

# Using mimir-distributed Helm chart
distributor:
  replicas: 3

ingester:
  replicas: 3
  zoneAwareReplication:
    enabled: true

querier:
  replicas: 3

queryFrontend:
  replicas: 2

queryScheduler:
  replicas: 2

storeGateway:
  replicas: 3

compactor:
  replicas: 1

Helm Deployment

Add Repository

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install Distributed Mimir

helm install mimir grafana/mimir-distributed \
  --namespace monitoring \
  --values values.yaml

Pre-Built Values Files

| File | Purpose | |------|---------| | values.yaml | Non-production testing with MinIO | | small.yaml | ~1 million series (single replicas, not HA) | | large.yaml | Production (~10 million series) |

Production Values Example

# Deployment mode
mimir:
  structuredConfig:
    multitenancy_enabled: true

# Storage configuration
mimir:
  structuredConfig:
    common:
      storage:
        backend: azure  # or s3, gcs
        azure:
          account_name: ${AZURE_STORAGE_ACCOUNT}
          account_key: ${AZURE_STORAGE_KEY}
          endpoint_suffix: blob.core.windows.net

    blocks_storage:
      azure:
        container_name: mimir-blocks

    alertmanager_storage:
      azure:
        container_name: mimir-alertmanager

    ruler_storage:
      azure:
        container_name: mimir-ruler

# Distributor
distributor:
  replicas: 3
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      memory: 4Gi

# Ingester
ingester:
  replicas: 3
  zoneAwareReplication:
    enabled: true
  persistentVolume:
    enabled: true
    size: 50Gi
  resources:
    requests:
      cpu: 2
      memory: 8Gi
    limits:
      memory: 16Gi

# Querier
querier:
  replicas: 3
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      memory: 8Gi

# Query Frontend
query_frontend:
  replicas: 2
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      memory: 2Gi

# Query Scheduler
query_scheduler:
  replicas: 2

# Store Gateway
store_gateway:
  replicas: 3
  persistentVolume:
    enabled: true
    size: 20Gi
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      memory: 8Gi

# Compactor
compactor:
  replicas: 1
  persistentVolume:
    enabled: true
    size: 50Gi
  resources:
    requests:
      cpu: 1
      memory: 4Gi
    limits:
      memory: 8Gi

# Gateway for external access
gateway:
  enabledNonEnterprise: true
  replicas: 2

# Monitoring
metaMonitoring:
  serviceMonitor:
    enabled: true

Storage Configuration

Critical Requirements

Must create buckets manually - Mimir doesn't create them
Separate buckets required - blocks_storage, alertmanager_storage, ruler_storage cannot share the same bucket+prefix
Azure: Hierarchical namespace must be disabled

Azure Blob Storage

mimir:
  structuredConfig:
    common:
      storage:
        backend: azure
        azure:
          account_name: <storage-account-name>
          # Option 1: Account Key (via environment variable)
          account_key: ${AZURE_STORAGE_KEY}
          # Option 2: User-Assigned Managed Identity
          # user_assigned_id: <identity-client-id>
          endpoint_suffix: blob.core.windows.net

    blocks_storage:
      azure:
        container_name: mimir-blocks

    alertmanager_storage:
      azure:
        container_name: mimir-alertmanager

    ruler_storage:
      azure:
        container_name: mimir-ruler

AWS S3

mimir:
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          endpoint: s3.us-east-1.amazonaws.com
          region: us-east-1
          access_key_id: ${AWS_ACCESS_KEY_ID}
          secret_access_key: ${AWS_SECRET_ACCESS_KEY}

    blocks_storage:
      s3:
        bucket_name: mimir-blocks

    alertmanager_storage:
      s3:
        bucket_name: mimir-alertmanager

    ruler_storage:
      s3:
        bucket_name: mimir-ruler

Google Cloud Storage

mimir:
  structuredConfig:
    common:
      storage:
        backend: gcs
        gcs:
          service_account: ${GCS_SERVICE_ACCOUNT_JSON}

    blocks_storage:
      gcs:
        bucket_name: mimir-blocks

    alertmanager_storage:
      gcs:
        bucket_name: mimir-alertmanager

    ruler_storage:
      gcs:
        bucket_name: mimir-ruler

Limits Configuration

mimir:
  structuredConfig:
    limits:
      # Ingestion limits
      ingestion_rate: 25000                    # Samples/sec per tenant
      ingestion_burst_size: 50000              # Burst size
      max_series_per_metric: 10000
      max_series_per_user: 1000000
      max_global_series_per_user: 1000000
      max_label_names_per_series: 30
      max_label_name_length: 1024
      max_label_value_length: 2048

      # Query limits
      max_fetched_series_per_query: 100000
      max_fetched_chunks_per_query: 2000000
      max_query_lookback: 0                    # No limit
      max_query_parallelism: 32

      # Retention
      compactor_blocks_retention_period: 365d  # 1 year

      # Out-of-order samples
      out_of_order_time_window: 5m

Per-Tenant Overrides (Runtime Configuration)

# runtime-config.yaml
overrides:
  tenant1:
    ingestion_rate: 50000
    max_series_per_user: 2000000
    compactor_blocks_retention_period: 730d    # 2 years
  tenant2:
    ingestion_rate: 75000
    max_global_series_per_user: 5000000

Enable runtime configuration:

mimir:
  structuredConfig:
    runtime_config:
      file: /etc/mimir/runtime-config.yaml
      period: 10s

High Availability Configuration

HA Tracker for Prometheus Deduplication

mimir:
  structuredConfig:
    distributor:
      ha_tracker:
        enable_ha_tracker: true
        kvstore:
          store: memberlist
        cluster_label: cluster
        replica_label: __replica__

    memberlist:
      join_members:
        - mimir-gossip-ring.monitoring.svc.cluster.local:7946

Prometheus Configuration:

global:
  external_labels:
    cluster: prom-team1
    __replica__: replica1

remote_write:
  - url: http://mimir-gateway:8080/api/v1/push
    headers:
      X-Scope-OrgID: my-tenant

Zone-Aware Replication

ingester:
  zoneAwareReplication:
    enabled: true
    zones:
      - name: zone-a
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1a
      - name: zone-b
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1b
      - name: zone-c
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1c

store_gateway:
  zoneAwareReplication:
    enabled: true

Shuffle Sharding

Limits tenant data to a subset of instances for fault isolation:

mimir:
  structuredConfig:
    limits:
      # Write path
      ingestion_tenant_shard_size: 3

      # Read path
      max_queriers_per_tenant: 5
      store_gateway_tenant_shard_size: 3

OpenTelemetry Integration

OTLP Metrics Ingestion

OpenTelemetry Collector Config:

exporters:
  otlphttp:
    endpoint: http://mimir-gateway:8080/otlp
    headers:
      X-Scope-OrgID: "my-tenant"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [otlphttp]

Exponential Histograms (Experimental)

// Go SDK configuration
Aggregation: metric.AggregationBase2ExponentialHistogram{
    MaxSize:  160,      // Maximum buckets
    MaxScale: 20,       // Scale factor
}

Key Benefits:

Explicit min/max values (no estimation needed)
Better accuracy for extreme percentiles
Native OTLP format preservation

Multi-Tenancy

mimir:
  structuredConfig:
    multitenancy_enabled: true
    no_auth_tenant: anonymous    # Used when multitenancy disabled

Query with tenant header:

curl -H "X-Scope-OrgID: tenant-a" \
  "http://mimir:8080/prometheus/api/v1/query?query=up"

Tenant ID Constraints:

Max 150 characters
Allowed: alphanumeric, ! - _ . * ' ( )
Prohibited: . or .. alone, __mimir_cluster, slashes

API Reference

Ingestion Endpoints

# Prometheus remote write
POST /api/v1/push

# OTLP metrics
POST /otlp/v1/metrics

# InfluxDB line protocol
POST /api/v1/push/influx/write

Query Endpoints

# Instant query
GET,POST /prometheus/api/v1/query?query=<promql>&time=<timestamp>

# Range query
GET,POST /prometheus/api/v1/query_range?query=<promql>&start=<start>&end=<end>&step=<step>

# Labels
GET,POST /prometheus/api/v1/labels
GET /prometheus/api/v1/label/{name}/values

# Series
GET,POST /prometheus/api/v1/series

# Exemplars
GET,POST /prometheus/api/v1/query_exemplars

# Cardinality
GET,POST /prometheus/api/v1/cardinality/label_names
GET,POST /prometheus/api/v1/cardinality/active_series

Administrative Endpoints

# Flush ingester data
GET,POST /ingester/flush

# Prepare shutdown
GET,POST,DELETE /ingester/prepare-shutdown

# Ring status
GET /ingester/ring
GET /distributor/ring
GET /store-gateway/ring
GET /compactor/ring

# Tenant stats
GET /distributor/all_user_stats
GET /api/v1/user_stats
GET /api/v1/user_limits

Health & Config

GET /ready
GET /metrics
GET /config
GET /config?mode=diff
GET /runtime_config

Azure Identity Configuration

User-Assigned Managed Identity

1. Create Identity:

az identity create \
  --name mimir-identity \
  --resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name mimir-identity --resource-group <rg> --query clientId -o tsv)
IDENTITY_PRINCIPAL_ID=$(az identity show --name mimir-identity --resource-group <rg> --query principalId -o tsv)

2. Assign to Node Pool:

az vmss identity assign \
  --resource-group <aks-node-rg> \
  --name <vmss-name> \
  --identities /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mimir-identity

3. Grant Storage Permission:

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id $IDENTITY_PRINCIPAL_ID \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

4. Configure Mimir:

mimir:
  structuredConfig:
    common:
      storage:
        azure:
          user_assigned_id: <IDENTITY_CLIENT_ID>

Workload Identity Federation

1. Create Federated Credential:

az identity federated-credential create \
  --name mimir-federated \
  --identity-name mimir-identity \
  --resource-group <rg> \
  --issuer <aks-oidc-issuer-url> \
  --subject system:serviceaccount:monitoring:mimir \
  --audiences api://AzureADTokenExchange

2. Configure Helm Values:

serviceAccount:
  annotations:
    azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
  azure.workload.identity/use: "true"

Troubleshooting

Common Issues

1. Container Not Found (Azure)

# Create required containers
az storage container create --name mimir-blocks --account-name <storage>
az storage container create --name mimir-alertmanager --account-name <storage>
az storage container create --name mimir-ruler --account-name <storage>

2. Authorization Failure (Azure)

# Verify RBAC assignment
az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

# Assign if missing
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id <principal-id> \
  --scope <storage-scope>

# Restart pod to refresh token
kubectl delete pod -n monitoring <ingester-pod>

3. Ingester OOM

ingester:
  resources:
    limits:
      memory: 16Gi  # Increase memory

4. Query Timeout

mimir:
  structuredConfig:
    querier:
      timeout: 5m
      max_concurrent: 20

5. High Cardinality

mimir:
  structuredConfig:
    limits:
      max_series_per_user: 5000000
      max_series_per_metric: 50000

Diagnostic Commands

# Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=mimir

# Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100

# Check distributor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100

# Verify readiness
kubectl exec -it <mimir-pod> -n monitoring -- wget -qO- http://localhost:8080/ready

# Check ring status
kubectl port-forward svc/mimir-distributor 8080:8080 -n monitoring
curl http://localhost:8080/distributor/ring

# Check configuration
kubectl exec -it <mimir-pod> -n monitoring -- cat /etc/mimir/mimir.yaml

# Validate configuration before deployment
mimir -modules -config.file <path-to-config-file>

Key Metrics to Monitor

# Ingestion rate per tenant
sum by (user) (rate(cortex_distributor_received_samples_total[5m]))

# Series count per tenant
sum by (user) (cortex_ingester_memory_series)

# Query latency
histogram_quantile(0.99, sum by (le) (rate(cortex_request_duration_seconds_bucket{route=~"/api/prom/api/v1/query.*"}[5m])))

# Compactor status
cortex_compactor_runs_completed_total
cortex_compactor_runs_failed_total

# Store-gateway block sync
cortex_bucket_store_blocks_loaded

Circuit Breakers (Ingester)

mimir:
  structuredConfig:
    ingester:
      push_circuit_breaker:
        enabled: true
        request_timeout: 2s
        failure_threshold_percentage: 10
        cooldown_period: 10s
      read_circuit_breaker:
        enabled: true
        request_timeout: 30s

States:

Closed - Normal operation
Open - Stops forwarding to failing instances
Half-open - Limited trial requests after cooldown

External Resources

Official Mimir Documentation
Mimir Helm Chart
Configuration Reference
HTTP API Reference
Mimir GitHub Repository

Gotchas

Multi-tenancy via X-Scope-OrgID: missing header writes to tenant "fake" silently — same trap as Loki.
Retention is per-tenant; global retention env var is fallback only — a misconfigured tenant silently overrides.
Ingester replication factor < 3 loses data on a single failure — default in some Helm charts is 1; don't ship it.
Query frontend cache TTL bumps invalidate ALL cached results — bumping query_range_results_cache_ttl doesn't extend existing entries.
Multi-tenant storage paths share an S3 bucket prefix — accidentally allowing tenant-1 to read tenant-2's path is a config mistake the chart doesn't prevent.
distributor.ingestion-rate-limit-strategy: global requires a working memberlist — silent failure mode: per-distributor limits without coordination, allowing 10x the intended traffic.

Grafana Mimir Skill

Comprehensive guide for Grafana Mimir - the horizontally scalable, highly available, multi-tenant time series database for long-term Prometheus metrics storage.

What is Mimir?

Mimir is an open-source, horizontally scalable, highly available, multi-tenant long-term storage solution for Prometheus and OpenTelemetry metrics that:

Overcomes Prometheus limitations - Scalability and long-term retention
Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header
Stores data in object storage - S3, GCS, Azure Blob Storage, or Swift
100% Prometheus compatible - PromQL queries, remote write protocol
Part of LGTM+ Stack - Logs, Grafana, Traces, Metrics unified observability

Architecture Overview

Core Components

Data Flow

Write Path:

Prometheus/OTel → Distributor → Ingester → Object Storage
                       ↓
                 Hash Ring
                 (routes by series)

Read Path:

Query → Query Frontend → Query Scheduler → Querier
                                              ↓
                                    Ingesters (recent)
                                              ↓
                                    Store-Gateway (historical)

Deployment Modes

1. Monolithic Mode (`-target=all`)

All components in single process
Best for: Development, testing, small-scale (~1M series)
Horizontally scalable by deploying multiple instances
Not recommended for large-scale (all components scale together)

2. Microservices Mode (Distributed) - Recommended for Production

# Using mimir-distributed Helm chart
distributor:
  replicas: 3

ingester:
  replicas: 3
  zoneAwareReplication:
    enabled: true

querier:
  replicas: 3

queryFrontend:
  replicas: 2

queryScheduler:
  replicas: 2

storeGateway:
  replicas: 3

compactor:
  replicas: 1

Helm Deployment

Add Repository

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install Distributed Mimir

helm install mimir grafana/mimir-distributed \
  --namespace monitoring \
  --values values.yaml

Pre-Built Values Files

Production Values Example

# Deployment mode
mimir:
  structuredConfig:
    multitenancy_enabled: true

# Storage configuration
mimir:
  structuredConfig:
    common:
      storage:
        backend: azure  # or s3, gcs
        azure:
          account_name: ${AZURE_STORAGE_ACCOUNT}
          account_key: ${AZURE_STORAGE_KEY}
          endpoint_suffix: blob.core.windows.net

    blocks_storage:
      azure:
        container_name: mimir-blocks

    alertmanager_storage:
      azure:
        container_name: mimir-alertmanager

    ruler_storage:
      azure:
        container_name: mimir-ruler

# Distributor
distributor:
  replicas: 3
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      memory: 4Gi

# Ingester
ingester:
  replicas: 3
  zoneAwareReplication:
    enabled: true
  persistentVolume:
    enabled: true
    size: 50Gi
  resources:
    requests:
      cpu: 2
      memory: 8Gi
    limits:
      memory: 16Gi

# Querier
querier:
  replicas: 3
  resources:
    requests:
      cpu: 1
      memory: 2Gi
    limits:
      memory: 8Gi

# Query Frontend
query_frontend:
  replicas: 2
  resources:
    requests:
      cpu: 500m
      memory: 1Gi
    limits:
      memory: 2Gi

# Query Scheduler
query_scheduler:
  replicas: 2

# Store Gateway
store_gateway:
  replicas: 3
  persistentVolume:
    enabled: true
    size: 20Gi
  resources:
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      memory: 8Gi

# Compactor
compactor:
  replicas: 1
  persistentVolume:
    enabled: true
    size: 50Gi
  resources:
    requests:
      cpu: 1
      memory: 4Gi
    limits:
      memory: 8Gi

# Gateway for external access
gateway:
  enabledNonEnterprise: true
  replicas: 2

# Monitoring
metaMonitoring:
  serviceMonitor:
    enabled: true

Storage Configuration

Critical Requirements

Must create buckets manually - Mimir doesn't create them
Separate buckets required - blocks_storage, alertmanager_storage, ruler_storage cannot share the same bucket+prefix
Azure: Hierarchical namespace must be disabled

Azure Blob Storage

mimir:
  structuredConfig:
    common:
      storage:
        backend: azure
        azure:
          account_name: <storage-account-name>
          # Option 1: Account Key (via environment variable)
          account_key: ${AZURE_STORAGE_KEY}
          # Option 2: User-Assigned Managed Identity
          # user_assigned_id: <identity-client-id>
          endpoint_suffix: blob.core.windows.net

    blocks_storage:
      azure:
        container_name: mimir-blocks

    alertmanager_storage:
      azure:
        container_name: mimir-alertmanager

    ruler_storage:
      azure:
        container_name: mimir-ruler

AWS S3

mimir:
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          endpoint: s3.us-east-1.amazonaws.com
          region: us-east-1
          access_key_id: ${AWS_ACCESS_KEY_ID}
          secret_access_key: ${AWS_SECRET_ACCESS_KEY}

    blocks_storage:
      s3:
        bucket_name: mimir-blocks

    alertmanager_storage:
      s3:
        bucket_name: mimir-alertmanager

    ruler_storage:
      s3:
        bucket_name: mimir-ruler

Google Cloud Storage

mimir:
  structuredConfig:
    common:
      storage:
        backend: gcs
        gcs:
          service_account: ${GCS_SERVICE_ACCOUNT_JSON}

    blocks_storage:
      gcs:
        bucket_name: mimir-blocks

    alertmanager_storage:
      gcs:
        bucket_name: mimir-alertmanager

    ruler_storage:
      gcs:
        bucket_name: mimir-ruler

Limits Configuration

mimir:
  structuredConfig:
    limits:
      # Ingestion limits
      ingestion_rate: 25000                    # Samples/sec per tenant
      ingestion_burst_size: 50000              # Burst size
      max_series_per_metric: 10000
      max_series_per_user: 1000000
      max_global_series_per_user: 1000000
      max_label_names_per_series: 30
      max_label_name_length: 1024
      max_label_value_length: 2048

      # Query limits
      max_fetched_series_per_query: 100000
      max_fetched_chunks_per_query: 2000000
      max_query_lookback: 0                    # No limit
      max_query_parallelism: 32

      # Retention
      compactor_blocks_retention_period: 365d  # 1 year

      # Out-of-order samples
      out_of_order_time_window: 5m

Per-Tenant Overrides (Runtime Configuration)

# runtime-config.yaml
overrides:
  tenant1:
    ingestion_rate: 50000
    max_series_per_user: 2000000
    compactor_blocks_retention_period: 730d    # 2 years
  tenant2:
    ingestion_rate: 75000
    max_global_series_per_user: 5000000

Enable runtime configuration:

mimir:
  structuredConfig:
    runtime_config:
      file: /etc/mimir/runtime-config.yaml
      period: 10s

High Availability Configuration

HA Tracker for Prometheus Deduplication

mimir:
  structuredConfig:
    distributor:
      ha_tracker:
        enable_ha_tracker: true
        kvstore:
          store: memberlist
        cluster_label: cluster
        replica_label: __replica__

    memberlist:
      join_members:
        - mimir-gossip-ring.monitoring.svc.cluster.local:7946

Prometheus Configuration:

global:
  external_labels:
    cluster: prom-team1
    __replica__: replica1

remote_write:
  - url: http://mimir-gateway:8080/api/v1/push
    headers:
      X-Scope-OrgID: my-tenant

Zone-Aware Replication

ingester:
  zoneAwareReplication:
    enabled: true
    zones:
      - name: zone-a
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1a
      - name: zone-b
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1b
      - name: zone-c
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1c

store_gateway:
  zoneAwareReplication:
    enabled: true

Shuffle Sharding

Limits tenant data to a subset of instances for fault isolation:

mimir:
  structuredConfig:
    limits:
      # Write path
      ingestion_tenant_shard_size: 3

      # Read path
      max_queriers_per_tenant: 5
      store_gateway_tenant_shard_size: 3

OpenTelemetry Integration

OTLP Metrics Ingestion

OpenTelemetry Collector Config:

exporters:
  otlphttp:
    endpoint: http://mimir-gateway:8080/otlp
    headers:
      X-Scope-OrgID: "my-tenant"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [otlphttp]

Exponential Histograms (Experimental)

// Go SDK configuration
Aggregation: metric.AggregationBase2ExponentialHistogram{
    MaxSize:  160,      // Maximum buckets
    MaxScale: 20,       // Scale factor
}

Key Benefits:

Explicit min/max values (no estimation needed)
Better accuracy for extreme percentiles
Native OTLP format preservation

Multi-Tenancy

mimir:
  structuredConfig:
    multitenancy_enabled: true
    no_auth_tenant: anonymous    # Used when multitenancy disabled

Query with tenant header:

curl -H "X-Scope-OrgID: tenant-a" \
  "http://mimir:8080/prometheus/api/v1/query?query=up"

Tenant ID Constraints:

Max 150 characters
Allowed: alphanumeric, ! - _ . * ' ( )
Prohibited: . or .. alone, __mimir_cluster, slashes

API Reference

Ingestion Endpoints

# Prometheus remote write
POST /api/v1/push

# OTLP metrics
POST /otlp/v1/metrics

# InfluxDB line protocol
POST /api/v1/push/influx/write

Query Endpoints

# Instant query
GET,POST /prometheus/api/v1/query?query=<promql>&time=<timestamp>

# Range query
GET,POST /prometheus/api/v1/query_range?query=<promql>&start=<start>&end=<end>&step=<step>

# Labels
GET,POST /prometheus/api/v1/labels
GET /prometheus/api/v1/label/{name}/values

# Series
GET,POST /prometheus/api/v1/series

# Exemplars
GET,POST /prometheus/api/v1/query_exemplars

# Cardinality
GET,POST /prometheus/api/v1/cardinality/label_names
GET,POST /prometheus/api/v1/cardinality/active_series

Administrative Endpoints

# Flush ingester data
GET,POST /ingester/flush

# Prepare shutdown
GET,POST,DELETE /ingester/prepare-shutdown

# Ring status
GET /ingester/ring
GET /distributor/ring
GET /store-gateway/ring
GET /compactor/ring

# Tenant stats
GET /distributor/all_user_stats
GET /api/v1/user_stats
GET /api/v1/user_limits

Health & Config

GET /ready
GET /metrics
GET /config
GET /config?mode=diff
GET /runtime_config

Azure Identity Configuration

User-Assigned Managed Identity

1. Create Identity:

az identity create \
  --name mimir-identity \
  --resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name mimir-identity --resource-group <rg> --query clientId -o tsv)
IDENTITY_PRINCIPAL_ID=$(az identity show --name mimir-identity --resource-group <rg> --query principalId -o tsv)

2. Assign to Node Pool:

az vmss identity assign \
  --resource-group <aks-node-rg> \
  --name <vmss-name> \
  --identities /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mimir-identity

3. Grant Storage Permission:

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id $IDENTITY_PRINCIPAL_ID \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

4. Configure Mimir:

mimir:
  structuredConfig:
    common:
      storage:
        azure:
          user_assigned_id: <IDENTITY_CLIENT_ID>

Workload Identity Federation

1. Create Federated Credential:

az identity federated-credential create \
  --name mimir-federated \
  --identity-name mimir-identity \
  --resource-group <rg> \
  --issuer <aks-oidc-issuer-url> \
  --subject system:serviceaccount:monitoring:mimir \
  --audiences api://AzureADTokenExchange

2. Configure Helm Values:

serviceAccount:
  annotations:
    azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
  azure.workload.identity/use: "true"

Troubleshooting

Common Issues

1. Container Not Found (Azure)

# Create required containers
az storage container create --name mimir-blocks --account-name <storage>
az storage container create --name mimir-alertmanager --account-name <storage>
az storage container create --name mimir-ruler --account-name <storage>

2. Authorization Failure (Azure)

# Verify RBAC assignment
az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

# Assign if missing
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id <principal-id> \
  --scope <storage-scope>

# Restart pod to refresh token
kubectl delete pod -n monitoring <ingester-pod>

3. Ingester OOM

ingester:
  resources:
    limits:
      memory: 16Gi  # Increase memory

4. Query Timeout

mimir:
  structuredConfig:
    querier:
      timeout: 5m
      max_concurrent: 20

5. High Cardinality

mimir:
  structuredConfig:
    limits:
      max_series_per_user: 5000000
      max_series_per_metric: 50000

Diagnostic Commands

# Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=mimir

# Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100

# Check distributor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100

# Verify readiness
kubectl exec -it <mimir-pod> -n monitoring -- wget -qO- http://localhost:8080/ready

# Check ring status
kubectl port-forward svc/mimir-distributor 8080:8080 -n monitoring
curl http://localhost:8080/distributor/ring

# Check configuration
kubectl exec -it <mimir-pod> -n monitoring -- cat /etc/mimir/mimir.yaml

# Validate configuration before deployment
mimir -modules -config.file <path-to-config-file>

Key Metrics to Monitor

# Ingestion rate per tenant
sum by (user) (rate(cortex_distributor_received_samples_total[5m]))

# Series count per tenant
sum by (user) (cortex_ingester_memory_series)

# Query latency
histogram_quantile(0.99, sum by (le) (rate(cortex_request_duration_seconds_bucket{route=~"/api/prom/api/v1/query.*"}[5m])))

# Compactor status
cortex_compactor_runs_completed_total
cortex_compactor_runs_failed_total

# Store-gateway block sync
cortex_bucket_store_blocks_loaded

Circuit Breakers (Ingester)

mimir:
  structuredConfig:
    ingester:
      push_circuit_breaker:
        enabled: true
        request_timeout: 2s
        failure_threshold_percentage: 10
        cooldown_period: 10s
      read_circuit_breaker:
        enabled: true
        request_timeout: 30s

States:

Closed - Normal operation
Open - Stops forwarding to failing instances
Half-open - Limited trial requests after cooldown

External Resources

Official Mimir Documentation
Mimir Helm Chart
Configuration Reference
HTTP API Reference
Mimir GitHub Repository

Gotchas

Multi-tenancy via X-Scope-OrgID: missing header writes to tenant "fake" silently — same trap as Loki.
Retention is per-tenant; global retention env var is fallback only — a misconfigured tenant silently overrides.
Ingester replication factor < 3 loses data on a single failure — default in some Helm charts is 1; don't ship it.
Query frontend cache TTL bumps invalidate ALL cached results — bumping query_range_results_cache_ttl doesn't extend existing entries.
Multi-tenant storage paths share an S3 bucket prefix — accidentally allowing tenant-1 to read tenant-2's path is a config mistake the chart doesn't prevent.
distributor.ingestion-rate-limit-strategy: global requires a working memberlist — silent failure mode: per-distributor limits without coordination, allowing 10x the intended traffic.

Adoption

julianobarbosa/mimir

$ install --global

Security Scan Results

SKILL.md

Grafana Mimir Skill

What is Mimir?

Architecture Overview

Core Components

Data Flow

Deployment Modes

1. Monolithic Mode (-target=all)

2. Microservices Mode (Distributed) - Recommended for Production

Helm Deployment

Add Repository

Install Distributed Mimir

Pre-Built Values Files

Production Values Example

Storage Configuration

Critical Requirements

Azure Blob Storage

AWS S3

Google Cloud Storage

Limits Configuration

Per-Tenant Overrides (Runtime Configuration)

High Availability Configuration

HA Tracker for Prometheus Deduplication

Zone-Aware Replication

Shuffle Sharding

OpenTelemetry Integration

OTLP Metrics Ingestion

Exponential Histograms (Experimental)

Multi-Tenancy

API Reference

Ingestion Endpoints

Query Endpoints

Administrative Endpoints

Health & Config

Azure Identity Configuration

User-Assigned Managed Identity

Workload Identity Federation

Troubleshooting

Common Issues

Diagnostic Commands

Key Metrics to Monitor

Circuit Breakers (Ingester)

External Resources

Gotchas

Related Skills

julianobarbosa/your-skill-name

julianobarbosa/zsh-path

julianobarbosa/zabbix-api

julianobarbosa/yt-music

julianobarbosa/mimir

$ install --global

Security Scan Results

SKILL.md

Grafana Mimir Skill

What is Mimir?

Architecture Overview

Core Components

Data Flow

Deployment Modes

1. Monolithic Mode (-target=all)

2. Microservices Mode (Distributed) - Recommended for Production

Helm Deployment

Add Repository

Install Distributed Mimir

Pre-Built Values Files

Production Values Example

Storage Configuration

Critical Requirements

Azure Blob Storage

AWS S3

Google Cloud Storage

Limits Configuration

Per-Tenant Overrides (Runtime Configuration)

High Availability Configuration

HA Tracker for Prometheus Deduplication

Zone-Aware Replication

1. Monolithic Mode (`-target=all`)

1. Monolithic Mode (`-target=all`)