workspace/skills/prometheus-monitoring/SKILL.md
Prometheus monitoring — PromQL instant/range queries, metric discovery, metadata, scrape target health, system health checks (6 tools). Use when querying Prometheus metrics, checking scrape targets, investigating alert thresholds, or analyzing network device utilization trends.
npx skillsauth add automateyournetwork/netclaw prometheus-monitoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
| Property | Value |
|----------|-------|
| Source | pab1it0/prometheus-mcp-server |
| Transport | stdio (default), SSE, or HTTP |
| Language | Python 3.10+ |
| Tools | 6 (query, range query, list metrics, metadata, targets, health check) |
| Auth | Basic auth (username/password), bearer token, or unauthenticated |
| Install | pip3 install prometheus-mcp-server (PyPI) |
| Run | prometheus-mcp-server (stdio) |
# stdio mode (default — used by NetClaw)
PROMETHEUS_URL=http://prometheus:9090 prometheus-mcp-server
# HTTP transport mode
PROMETHEUS_MCP_SERVER_TRANSPORT=http PROMETHEUS_URL=http://prometheus:9090 prometheus-mcp-server
# With basic auth
PROMETHEUS_URL=http://prometheus:9090 PROMETHEUS_USERNAME=admin PROMETHEUS_PASSWORD=secret prometheus-mcp-server
# With bearer token (Grafana Cloud, Thanos, etc.)
PROMETHEUS_URL=https://prom.example.com PROMETHEUS_TOKEN=your_bearer_token prometheus-mcp-server
| Variable | Required | Example | Description |
|----------|----------|---------|-------------|
| PROMETHEUS_URL | Yes | http://prometheus:9090 | Prometheus server endpoint |
| PROMETHEUS_USERNAME | No | admin | Basic auth username |
| PROMETHEUS_PASSWORD | No | changeme | Basic auth password |
| PROMETHEUS_TOKEN | No | eyJhbG... | Bearer token (Grafana Cloud, Thanos, Cortex) |
| PROMETHEUS_URL_SSL_VERIFY | No | false | Disable SSL certificate verification |
| PROMETHEUS_REQUEST_TIMEOUT | No | 30 | Request timeout in seconds (default: 30) |
| PROMETHEUS_DISABLE_LINKS | No | true | Disable Prometheus UI links in responses (saves context) |
| ORG_ID | No | 1 | Multi-tenant organization ID (Cortex/Mimir) |
| PROMETHEUS_CUSTOM_HEADERS | No | {"X-Custom":"val"} | Additional HTTP headers as JSON |
| PROMETHEUS_MCP_SERVER_TRANSPORT | No | stdio | Transport: stdio (default), http, or sse |
| Tool | Parameters | What It Does |
|------|-----------|-------------|
| execute_query | query, timeout? | Execute instant PromQL query at current time |
| execute_range_query | query, start, end, step, timeout? | Execute PromQL range query over time interval |
| list_metrics | page?, page_size? | Browse available metric names with pagination |
| get_metric_metadata | metric?, limit? | Retrieve metric type, help text, and unit info |
| get_targets | none | View scrape target details (up/down, labels, last scrape) |
| health_check | none | Check Prometheus server availability and readiness |
When checking Prometheus for network device metrics:
health_check — verify Prometheus is reachablelist_metrics — find available SNMP/device metricsget_metric_metadata(metric="ifHCInOctets") — check type and descriptionexecute_query(query="up{job='snmp'}") — check which targets are upexecute_range_query — trend analysis over time:
rate(ifHCInOctets{instance="router1"}[5m]) * 8device_cpu_utilization{device="core-rtr-01"}increase(ifInErrors{device=~".*"}[1h])bgp_peer_state{peer="10.1.1.2"}get_targets — verify SNMP exporters and device scrape healthhealth_check()
list_metrics(page=1, page_size=50)
execute_query(query="rate(ifHCInOctets{device='core-rtr-01'}[5m]) * 8")
execute_range_query(query="rate(ifHCOutOctets{device='core-rtr-01'}[5m]) * 8", start="2024-01-01T00:00:00Z", end="2024-01-01T01:00:00Z", step="60s")
get_targets()
When investigating whether metrics are crossing alert thresholds:
list_metrics — find the metric nameget_metric_metadata — understand metric type (counter, gauge, histogram)execute_query — get current metric valueexecute_range_query — check trend over past 1h/6h/24hget_targets — check if specific exporters are downWhen analyzing capacity trends for network infrastructure:
list_metrics — find bandwidth/utilization metricsexecute_range_query with max_over_time():
max_over_time(rate(ifHCInOctets{device="core-rtr-01",ifName="Gi0/0"}[5m])[7d:1h]) * 8execute_range_query with quantile_over_time():
quantile_over_time(0.95, rate(ifHCInOctets{device="core-rtr-01"}[5m])[30d:1h]) * 8| Skill | Integration | |-------|-------------| | grafana-observability | Grafana dashboards visualize Prometheus data; use Prometheus skill for direct PromQL when Grafana isn't available or for ad-hoc queries | | pyats-health-check | Cross-reference pyATS device health with Prometheus time-series metrics | | pyats-routing | Correlate OSPF/BGP state changes with Prometheus metric timelines | | gait-session-tracking | Record all Prometheus queries and findings in GAIT audit trail | | te-network-monitoring | Pair ThousandEyes path data with Prometheus infrastructure metrics | | sdwan-ops | Correlate SD-WAN vManage alarms with Prometheus device metrics | | servicenow-change-workflow | Reference Prometheus metrics as evidence in change requests |
list_metrics supports page and page_size to avoid large responsesexecute_range_query time ranges return large result setsPROMETHEUS_DISABLE_LINKS=true to reduce response sizehealth_check before running queries to confirm Prometheus is reachablePROMETHEUS_URL, PROMETHEUS_USERNAME/PROMETHEUS_PASSWORD, or PROMETHEUS_TOKEN in ~/.openclaw/.env. Verify Prometheus allows the configured auth method.PROMETHEUS_URL is reachable. Use health_check to diagnose connectivity.list_metrics and get_metric_metadata to discover valid metric names before querying.get_targets to verify scrape targets are up and the expected labels exist.PROMETHEUS_REQUEST_TIMEOUT for slow queries or large result sets.PROMETHEUS_URL_SSL_VERIFY=false for self-signed certificates (development only).testing
Human-in-the-loop escalation via HumanRail — route low-confidence agent decisions, pre-destructive operation approvals, and ambiguous incident tickets to real human engineers. Human answers are verified and returned as structured output. Workers are paid via Lightning Network. Use when the agent is uncertain, when a destructive change needs explicit human sign-off beyond a ServiceNow CR, or when an ambiguous ticket requires human triage before automated handling.
testing
Manage EVE-NG node lifecycle. Use when listing nodes, checking runtime state, creating or deleting nodes, starting or stopping nodes or whole labs, verifying node details, or wiping node NVRAM back to factory defaults.
development
Manage EVE-NG labs and platform inventory. Use when listing labs, checking lab metadata, creating or deleting labs, importing or exporting lab archives, checking EVE-NG health or auth, or verifying available node images before build work.
tools
Execute live CLI commands on running EVE-NG nodes over telnet console. Use when running show commands, making live config changes, verifying protocol state, testing connectivity, checking console readiness, or interacting with IOS, Junos, VPCS, EOS, or NX-OS nodes.