claude/skills/datadog/SKILL.md
Query logs, metrics, monitors, and dashboards from Datadog. Search logs, check alert status, and investigate incidents.
npx skillsauth add tbroadley/dotfiles datadogInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides access to Datadog for monitoring, logging, and alerting. Use the pup CLI (DataDog/pup) as the primary tool. Fall back to the API directly only for features pup doesn't cover yet.
pup is installed via install.sh. Authenticate with OAuth2 or API keys:
# OAuth2 (preferred — browser-based, auto-refreshing tokens)
export DD_SITE="us3.datadoghq.com"
pup auth login
# Or API keys (fallback)
export DD_API_KEY="your-api-key"
export DD_APP_KEY="your-application-key"
export DD_SITE="us3.datadoghq.com"
Verify: pup auth status or pup test
For features pup doesn't cover, use curl with API keys:
export DD_API_KEY="your-api-key"
export DD_APP_KEY="your-application-key"
export DD_SITE="us3.datadoghq.com"
Use this skill when the user:
Output formats: pup <command> -o json|table|yaml (default: json). Use -y to skip confirmation on destructive ops.
pup logs search --query="service:my-service status:error" --from="1h"
pup logs list --query="status:error" --from="30m"
pup logs aggregate --query="service:api" --from="1d"
pup monitors list
pup monitors get 12345678
pup monitors search --query="status:Alert"
pup metrics query --query="avg:system.cpu.user{*}" --from="1h"
pup metrics search --query="avg:system.cpu.user{*}" --from="1h"
pup metrics list --filter="system.*"
pup metrics get METRIC_NAME
pup dashboards list
pup dashboards get abc-123-def
pup dashboards url abc-123-def
pup incidents list
pup incidents get abc-123-def
pup incidents attachments abc-123-def
pup events list --from="1d"
pup events search --query="source:my-service" --from="1h"
pup slos list
pup slos get abc-123
pup traces search --query="service:api" --from="1h"
pup apm services
pup apm dependencies --service=api
pup apm flow-map --service=api
pup infrastructure hosts list
pup tags list
pup tags get HOSTNAME
pup security rules list
pup security signals list --from="1d"
pup security findings search --query="status:critical"
pup on-call teams # On-call team management
pup cases search --query="status:open" # Case management
pup error-tracking issues search # Error tracking
pup service-catalog list # Service catalog
pup audit-logs search --from="1d" # Audit logs
pup usage summary # Usage metering
pup cost projected # Cost management
pup synthetics tests list # Synthetic tests
pup downtime list # Scheduled downtimes
pup cicd pipelines list # CI/CD visibility
pup covers ~45% of Datadog APIs. For features it doesn't support (profiling, containers, processes, session replay, DORA metrics, etc.), fall back to the API directly.
Base URL: https://api.$(printenv DD_SITE)/api/v1 or v2
# Example: API endpoint not covered by pup
curl -s "https://api.$(printenv DD_SITE)/api/v2/ENDPOINT" \
-H "DD-API-KEY: $(printenv DD_API_KEY)" \
-H "DD-APPLICATION-KEY: $(printenv DD_APP_KEY)"
Works in both pup logs search --query= and API calls:
| Operator | Example | Description |
|----------|---------|-------------|
| AND | service:api status:error | Both conditions (implicit) |
| OR | status:error OR status:warn | Either condition |
| NOT | -status:debug | Exclude matches |
| Wildcard | service:api-* | Pattern matching |
| Range | @duration:>1000 | Numeric comparisons |
| Exists | @http.url:* | Field exists |
Common filters: service:name, status:error|warn|info|debug, @http.status_code:500, host:hostname, env:production
Hawk scan jobs emit structured log entries containing @scan_metrics and @scan_progress fields. These are log-based (not Datadog custom metrics) — query them via pup logs search.
# Get scan metrics for a specific job (use --limit 1000 --sort asc for chronological history)
pup logs search \
--query="inspect_ai_job_id:<JOB_ID> @scan_metrics.tasks_scanning:*" \
--from="7d" --limit 1000 --sort asc -o json
# Note: default limit is 10. Use --limit 1000 to get more entries.
# Logs are emitted every ~6 seconds by inspect_scout.
# To cover the full timeline, query both --sort asc (oldest 1000) and --sort desc (newest 1000).
@scan_metrics.* (from inspect_scout.ScanMetrics):
| Field | Description |
|-------|-------------|
| @scan_metrics.task_count | Total async worker tasks |
| @scan_metrics.tasks_idle | Workers doing nothing |
| @scan_metrics.tasks_parsing | Workers parsing transcript files |
| @scan_metrics.tasks_scanning | Workers running scanners (making API calls) |
| @scan_metrics.buffered_scanner_jobs | Jobs buffered waiting to be scanned |
| @scan_metrics.completed_scans | Total completed scan operations |
| @scan_metrics.memory_usage | Process memory (RSS/USS) in bytes |
| @scan_metrics.process_count | Number of OS processes |
| @scan_metrics.batch_pending | Pending batch API requests |
| @scan_metrics.batch_failures | Failed batch API requests |
| @scan_metrics.batch_oldest_created | Timestamp of oldest pending batch |
@scan_progress.*:
| Field | Description |
|-------|-------------|
| @scan_progress.completed | Number of completed scans |
| @scan_progress.total | Total scans expected |
| @scan_progress.percent | Completion percentage |
inspect_ai_job_id:<JOB_ID> — the scan run IDinspect_ai_job_type:scaninspect_ai_created_by:<user>kube_job:<JOB_ID>service:runnerStandard Datadog container metrics are also available, tagged by kube_job:<JOB_ID>:
pup metrics query --query="avg:container.pid.thread_count{kube_job:<JOB_ID>}" --from="24h"
pup metrics query --query="avg:container.cpu.usage{kube_job:<JOB_ID>}" --from="24h"
pup metrics query --query="avg:container.memory.usage{kube_job:<JOB_ID>}" --from="24h"
us3.datadoghq.comtools
Add words to the Wispr Flow dictionary. Use when the user wants to add a word, phrase, or snippet to Wispr Flow for voice dictation.
documentation
Upload images to a GitHub PR description or comment using a shared gist as image hosting. Use when the user wants to add plots, screenshots, or other images to a PR.
testing
Manage tasks, projects, and productivity in Todoist. View tasks, add new items, check completed work, and organize projects.
data-ai
Use when working with stacked diffs (branch B based on branch A, which is based on main).