.claude/skills/observability-alert-manager/SKILL.md
Configure Grafana alerts for Claude Code anomalies and thresholds. Use when setting up monitoring alerts for sessions, errors, context usage, or subagents.
npx skillsauth add adaptationio/skrillz observability-alert-managerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Configure and manage Grafana alerts for Claude Code monitoring using enhanced telemetry.
Primary: {job="claude_code_enhanced"} in Loki
create-alertDefine new alert rule. Parameters: name, query (LogQL), threshold, duration, severity, notification.
list-alertsShow all configured alerts and their status.
test-alertSimulate alert conditions.
delete-alertRemove alert rule.
Long Session Duration: Session >1 hour
{job="claude_code_enhanced", event_type="session_end"} | json | duration_seconds > 3600
High Turn Count: Session >50 turns
{job="claude_code_enhanced", event_type="session_end"} | json | turn_count > 50
Session Error Spike: >5 errors in session
{job="claude_code_enhanced", event_type="session_end"} | json | error_count > 5
High Error Rate: >5 errors/hour
count_over_time({job="claude_code_enhanced", event_type="tool_result", status="error"} [1h]) > 5
Specific Tool Failures: Bash errors
count_over_time({job="claude_code_enhanced", event_type="tool_result", status="error", tool="Bash"} [1h]) > 3
High Context Usage: >80% context window
{job="claude_code_enhanced", event_type="context_utilization"} | json | context_percentage > 80
Auto Compaction Triggered: Context full
{job="claude_code_enhanced", event_type="context_compact", trigger="auto"}
{job="claude_code_enhanced", event_type="session_end"} | json | subagents_spawned > 10
Telemetry Staleness: No data >10min
absent_over_time({job="claude_code_enhanced"} [10m])
Unusual Activity Spike: >100 tool calls/hour
count_over_time({job="claude_code_enhanced", event_type="tool_call"} [1h]) > 100
count_over_time({job="claude_code_enhanced", event_type="user_prompt", pattern="debugging"} [1h]) > 10
create-alert \
--name "High Error Rate" \
--query 'count_over_time({job="claude_code_enhanced", event_type="tool_result", status="error"} [1h]) > 5' \
--severity warning \
--notification slack
create-alert \
--name "High Context Usage" \
--query '{job="claude_code_enhanced", event_type="context_utilization"} | json | context_percentage > 80' \
--severity info \
--notification email
create-alert \
--name "Long Session Warning" \
--query '{job="claude_code_enhanced", event_type="session_end"} | json | duration_seconds > 3600' \
--severity info \
--notification dashboard
curl -X POST http://localhost:3000/api/ruler/grafana/api/v1/rules/claude-code \
-H "Content-Type: application/json" \
-u admin:admin \
-d '{
"name": "claude-code-alerts",
"rules": [
{
"alert": "HighErrorRate",
"expr": "count_over_time({job=\"claude_code_enhanced\", status=\"error\"} [1h]) > 5",
"for": "5m",
"labels": {"severity": "warning"},
"annotations": {"summary": "High error rate detected"}
}
]
}'
| Level | Use Case |
|-------|----------|
| critical | Immediate action required |
| warning | Needs attention soon |
| info | Informational, no action needed |
scripts/create-alert.sh - Create new alertscripts/list-alerts.sh - List all alertsscripts/test-alerts.sh - Test alert conditionsscripts/import-alert-templates.sh - Import all pre-built templatesdevelopment
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.