Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

julianobarbosa/holmesgpt

Name: holmesgpt
Author: julianobarbosa

skills/holmesgpt/SKILL.md

npx skillsauth add julianobarbosa/claude-code-skills holmesgpt

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

HolmesGPT Skill

AI-powered troubleshooting for Kubernetes and cloud-native environments.

Overview

HolmesGPT is a CNCF Sandbox project that connects AI models with live observability data to investigate infrastructure problems, find root causes, and suggest remediations. It operates with read-only access and respects RBAC permissions, making it safe for production environments.

Quick Reference

| Topic | Reference | |-------|-----------| | Installation | references/installation.md | | Configuration | references/configuration.md | | Data Sources | references/data-sources.md | | Commands | references/commands.md | | Troubleshooting | references/troubleshooting.md | | HTTP API | references/http-api.md | | Integrations | references/integrations.md |

Key Features

Root Cause Analysis: Investigates alerts and cluster issues
Multi-Source Integration: 30+ toolsets (K8s, Prometheus, Grafana)
Alert Integration: AlertManager, PagerDuty, OpsGenie, Jira, Slack
Interactive Mode: Troubleshooting with /run, /show, /clear
Custom Toolsets: Extend with proprietary tools via YAML configuration
CI/CD Integration: Automated deployment failure investigation

Installation Quick Start

CLI (Homebrew)

brew tap robusta-dev/homebrew-holmesgpt
brew install holmesgpt
export ANTHROPIC_API_KEY="your-key"  # or OPENAI_API_KEY
holmes ask "what pods are unhealthy?"

Kubernetes (Helm)

helm repo add robusta https://robusta-charts.storage.googleapis.com
helm repo update
helm install holmesgpt robusta/holmes -f values.yaml

Docker

docker run -it --net=host \
  -e OPENAI_API_KEY="your-key" \
  -v ~/.kube/config:/root/.kube/config \
  us-central1-docker.pkg.dev/genuine-flight-317411/devel/holmes \
  ask "what pods are crashing?"

Essential Commands

# Basic investigation
holmes ask "what pods are unhealthy and why?"
holmes ask "why is my deployment failing?"

# Interactive mode
holmes ask "investigate issue" --interactive

# Alert investigation
holmes investigate alertmanager --alertmanager-url http://localhost:9093
holmes investigate pagerduty --pagerduty-api-key <KEY> --update

# With file context
holmes ask "summarize the key points" -f ./logs.txt

# CI/CD integration
holmes ask "why did deployment fail?" --destination slack --slack-token <TOKEN>

Supported AI Providers

| Provider | Environment Variable | Models | |----------|---------------------|--------| | Anthropic | ANTHROPIC_API_KEY | Sonnet 4, Opus 4.5 | | OpenAI | OPENAI_API_KEY | GPT-4.1, GPT-4o | | Azure OpenAI | AZURE_API_KEY | GPT-4.1 | | AWS Bedrock | AWS credentials | Claude 3.5 Sonnet | | Google Gemini | GEMINI_API_KEY | Gemini 1.5 Pro | | Vertex AI | VERTEXAI_PROJECT | Gemini 1.5 Pro | | Ollama | Local install | Llama 3.1, Mistral |

Basic Helm Values Structure

# values.yaml for Kubernetes deployment
image:
  repository: robustadev/holmes
  tag: latest

env:
  - name: ANTHROPIC_API_KEY
    valueFrom:
      secretKeyRef:
        name: holmesgpt-secrets
        key: anthropic-api-key

# Model configuration
modelList:
  sonnet:
    api_key: "{{ env.ANTHROPIC_API_KEY }}"
    model: anthropic/claude-sonnet-4-20250514
    temperature: 0

# Toolsets to enable
toolsets:
  kubernetes/core:
    enabled: true
  kubernetes/logs:
    enabled: true
  prometheus/metrics:
    enabled: true

# Resources
resources:
  requests:
    memory: "1024Mi"
    cpu: "100m"
  limits:
    memory: "1024Mi"

# RBAC (read-only by default)
createServiceAccount: true

Interactive Mode Commands

| Command | Description | |---------|-------------| | /clear | Reset context when changing topics | | /run | Execute custom commands and share output with AI | | /show | Display complete tool outputs | | /context | Review accumulated investigation information |

Custom Toolset Example

# custom-toolset.yaml
toolsets:
  my-custom-tool:
    description: "Custom diagnostic tool"
    tools:
      - name: check_service_health
        description: "Check health of a specific service"
        command: |
          curl -s http://{{ service_name }}.{{ namespace }}.svc.cluster.local/health
        parameters:
          - name: service_name
            description: "Name of the service"
          - name: namespace
            description: "Kubernetes namespace"

Use with: holmes ask "check health" -t custom-toolset.yaml

Kubernetes Annotations for Integration

# Add to Services/Deployments for HolmesGPT context
metadata:
  annotations:
    holmesgpt.dev/runbook: |
      This service handles payment processing.
      Common issues: database connectivity, API rate limits.
      Check: kubectl logs -l app=payment-service

Environment Variables Reference

| Variable | Description | Default | |----------|-------------|---------| | HOLMES_CONFIG_PATH | Config file path | ~/.holmes/config.yaml | | HOLMES_LOG_LEVEL | Log verbosity | INFO | | PROMETHEUS_URL | Prometheus server URL | - | | GITHUB_TOKEN | GitHub API token | - | | DATADOG_API_KEY | DataDog API key | - | | CONFLUENCE_BASE_URL | Confluence URL | - |

Best Practices

Use Specific Queries: Include namespace, deployment name, symptoms
Start with Claude Sonnet 4.0/4.5: Best accuracy for complex investigations
Enable Relevant Toolsets: Only enable what you need to reduce noise
Use Interactive Mode: For complex multi-step investigations
Set Up Runbooks: Provide context for known alert types
CI/CD Integration: Automate deployment failure analysis

Security Considerations

HolmesGPT uses read-only access (get, list, watch only)
Respects existing RBAC permissions
Never modifies, creates, or deletes resources
API keys stored in Kubernetes Secrets
Data not used for model training

Official Resources

Documentation: https://holmesgpt.dev/
GitHub: https://github.com/robusta-dev/holmesgpt
Helm Chart: https://github.com/robusta-dev/holmesgpt/tree/master/helm/holmes
Slack Community: Cloud Native Slack

Gotchas

Read-only RBAC means HolmesGPT can't see Secrets by default: Investigations involving misconfigured Secret refs return "no permission to read" even though the agent flags it as a possible cause. Either grant secrets:get on a specific namespace or accept the blind spot — don't broaden cluster-wide.
Toolset enablement is cumulative and noisy at scale: Enabling all 30+ toolsets makes the LLM scan irrelevant data and dilutes accuracy. Enable only the toolsets matching your stack — every extra one costs tokens and adds noise to root-cause analysis.
Model temperature MUST be 0 for reproducible investigations: Default Helm values sometimes ship with temperature > 0; same alert gives different root causes across runs. Pin temperature: 0 in modelList or compare results between runs and lose trust.
AlertManager URL must be reachable from the HolmesGPT pod, not the CLI: holmes investigate alertmanager --alertmanager-url http://localhost:9093 works from a laptop but fails inside the cluster — use the in-cluster service DNS (http://kube-prometheus-stack-alertmanager.monitoring:9093).
/clear doesn't reset toolset context, only conversation history: Cached tool outputs from prior investigation persist within the session. Long interactive sessions accumulate stale Prometheus data that contaminates new questions. Restart the CLI between unrelated incidents.
Anthropic model names in modelList need the anthropic/ prefix: model: claude-sonnet-4-20250514 fails silently with provider-not-found; correct form is model: anthropic/claude-sonnet-4-20250514. LiteLLM error message says "model not found" without naming the missing prefix.

julianobarbosa/holmesgpt

skills/holmesgpt/SKILL.md

Guide for implementing HolmesGPT - an AI agent for troubleshooting cloud-native environments. Use when investigating Kubernetes issues, analyzing alerts from Prometheus/AlertManager/PagerDuty, performing root cause analysis, configuring HolmesGPT installations (CLI/Helm/Docker), setting up AI providers (OpenAI/Anthropic/Azure), creating custom toolsets, or integrating with observability platforms (Grafana, Loki, Tempo, DataDog).

73 stars

tools

Updated May 21, 2026

$ install --global

skillsauth

npx skillsauth add julianobarbosa/claude-code-skills holmesgpt

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 21, 2026, 7:25 AM120.5s8 files scanned

SKILL.md

name:: holmesgpt
description:: Guide for implementing HolmesGPT - an AI agent for troubleshooting cloud-native environments. Use when investigating Kubernetes issues, analyzing alerts from Prometheus/AlertManager/PagerDuty, performing root cause analysis, configuring HolmesGPT installations (CLI/Helm/Docker), setting up AI providers (OpenAI/Anthropic/Azure), creating custom toolsets, or integrating with observability platforms (Grafana, Loki, Tempo, DataDog).

HolmesGPT Skill

AI-powered troubleshooting for Kubernetes and cloud-native environments.

Overview

Quick Reference

Key Features

Root Cause Analysis: Investigates alerts and cluster issues
Multi-Source Integration: 30+ toolsets (K8s, Prometheus, Grafana)
Alert Integration: AlertManager, PagerDuty, OpsGenie, Jira, Slack
Interactive Mode: Troubleshooting with /run, /show, /clear
Custom Toolsets: Extend with proprietary tools via YAML configuration
CI/CD Integration: Automated deployment failure investigation

Installation Quick Start

CLI (Homebrew)

brew tap robusta-dev/homebrew-holmesgpt
brew install holmesgpt
export ANTHROPIC_API_KEY="your-key"  # or OPENAI_API_KEY
holmes ask "what pods are unhealthy?"

Kubernetes (Helm)

helm repo add robusta https://robusta-charts.storage.googleapis.com
helm repo update
helm install holmesgpt robusta/holmes -f values.yaml

Docker

docker run -it --net=host \
  -e OPENAI_API_KEY="your-key" \
  -v ~/.kube/config:/root/.kube/config \
  us-central1-docker.pkg.dev/genuine-flight-317411/devel/holmes \
  ask "what pods are crashing?"

Essential Commands

# Basic investigation
holmes ask "what pods are unhealthy and why?"
holmes ask "why is my deployment failing?"

# Interactive mode
holmes ask "investigate issue" --interactive

# Alert investigation
holmes investigate alertmanager --alertmanager-url http://localhost:9093
holmes investigate pagerduty --pagerduty-api-key <KEY> --update

# With file context
holmes ask "summarize the key points" -f ./logs.txt

# CI/CD integration
holmes ask "why did deployment fail?" --destination slack --slack-token <TOKEN>

Supported AI Providers

Basic Helm Values Structure

# values.yaml for Kubernetes deployment
image:
  repository: robustadev/holmes
  tag: latest

env:
  - name: ANTHROPIC_API_KEY
    valueFrom:
      secretKeyRef:
        name: holmesgpt-secrets
        key: anthropic-api-key

# Model configuration
modelList:
  sonnet:
    api_key: "{{ env.ANTHROPIC_API_KEY }}"
    model: anthropic/claude-sonnet-4-20250514
    temperature: 0

# Toolsets to enable
toolsets:
  kubernetes/core:
    enabled: true
  kubernetes/logs:
    enabled: true
  prometheus/metrics:
    enabled: true

# Resources
resources:
  requests:
    memory: "1024Mi"
    cpu: "100m"
  limits:
    memory: "1024Mi"

# RBAC (read-only by default)
createServiceAccount: true

Interactive Mode Commands

Custom Toolset Example

# custom-toolset.yaml
toolsets:
  my-custom-tool:
    description: "Custom diagnostic tool"
    tools:
      - name: check_service_health
        description: "Check health of a specific service"
        command: |
          curl -s http://{{ service_name }}.{{ namespace }}.svc.cluster.local/health
        parameters:
          - name: service_name
            description: "Name of the service"
          - name: namespace
            description: "Kubernetes namespace"

Use with: holmes ask "check health" -t custom-toolset.yaml

Kubernetes Annotations for Integration

# Add to Services/Deployments for HolmesGPT context
metadata:
  annotations:
    holmesgpt.dev/runbook: |
      This service handles payment processing.
      Common issues: database connectivity, API rate limits.
      Check: kubectl logs -l app=payment-service

Environment Variables Reference

Best Practices

Use Specific Queries: Include namespace, deployment name, symptoms
Start with Claude Sonnet 4.0/4.5: Best accuracy for complex investigations
Enable Relevant Toolsets: Only enable what you need to reduce noise
Use Interactive Mode: For complex multi-step investigations
Set Up Runbooks: Provide context for known alert types
CI/CD Integration: Automate deployment failure analysis

Security Considerations

HolmesGPT uses read-only access (get, list, watch only)
Respects existing RBAC permissions
Never modifies, creates, or deletes resources
API keys stored in Kubernetes Secrets
Data not used for model training

Official Resources

Documentation: https://holmesgpt.dev/
GitHub: https://github.com/robusta-dev/holmesgpt
Helm Chart: https://github.com/robusta-dev/holmesgpt/tree/master/helm/holmes
Slack Community: Cloud Native Slack

Gotchas

Read-only RBAC means HolmesGPT can't see Secrets by default: Investigations involving misconfigured Secret refs return "no permission to read" even though the agent flags it as a possible cause. Either grant secrets:get on a specific namespace or accept the blind spot — don't broaden cluster-wide.
Toolset enablement is cumulative and noisy at scale: Enabling all 30+ toolsets makes the LLM scan irrelevant data and dilutes accuracy. Enable only the toolsets matching your stack — every extra one costs tokens and adds noise to root-cause analysis.
Model temperature MUST be 0 for reproducible investigations: Default Helm values sometimes ship with temperature > 0; same alert gives different root causes across runs. Pin temperature: 0 in modelList or compare results between runs and lose trust.
AlertManager URL must be reachable from the HolmesGPT pod, not the CLI: holmes investigate alertmanager --alertmanager-url http://localhost:9093 works from a laptop but fails inside the cluster — use the in-cluster service DNS (http://kube-prometheus-stack-alertmanager.monitoring:9093).
/clear doesn't reset toolset context, only conversation history: Cached tool outputs from prior investigation persist within the session. Long interactive sessions accumulate stale Prometheus data that contaminates new questions. Restart the CLI between unrelated incidents.
Anthropic model names in modelList need the anthropic/ prefix: model: claude-sonnet-4-20250514 fails silently with provider-not-found; correct form is model: anthropic/claude-sonnet-4-20250514. LiteLLM error message says "model not found" without naming the missing prefix.

Related Skills

julianobarbosa/your-skill-name

testing

VerifiedTrustedCommunity

Brief description of what this skill does. Include specific triggers - when should Claude use this skill? Example triggers, file types, or keywords that indicate this skill applies.

76SKILL.mdUpdated May 30, 2026

julianobarbosa/your-skill-name

julianobarbosa/zsh-path

tools

VerifiedTrustedCommunity

Manage and troubleshoot PATH configuration in zsh. Use when adding tools to PATH (bun, nvm, Python venv, cargo, go), diagnosing "command not found" errors, validating PATH entries, or organizing shell configuration in .zshrc and .zshrc.local files.

76SKILL.mdUpdated May 30, 2026

julianobarbosa/zsh-path

julianobarbosa/zabbix-api

tools

VerifiedTrustedCommunity

Zabbix monitoring system automation via API and Python. Use when: (1) Managing hosts, templates, items, triggers, or host groups, (2) Automating monitoring configuration, (3) Sending data via Zabbix trapper/sender, (4) Querying historical data or events, (5) Bulk operations on Zabbix objects, (6) Maintenance window management, (7) User/permission management

76SKILL.mdUpdated May 30, 2026

julianobarbosa/zabbix-api

julianobarbosa/yt-music

development

VerifiedTrustedCommunity

Operate YouTube Music via natural language. Search songs, artists, albums, playlists, lyrics, charts, recommendations, and control playback. Browse personal library, manage playlists, rate tracks, and inspect account info. Use this skill whenever the user asks about YouTube Music, wants to play music, manage playlists, search by song or artist name, inspect lyrics, or control playback.

76SKILL.mdUpdated May 30, 2026

julianobarbosa/yt-music

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/julianobarbosa/claude-code-skills.git

# Copy into Claude Code skills folder (global)
cp -r claude-code-skills/skills/holmesgpt ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

julianobarbosa/claude-code-skills

73 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT