skills/api-observability-planner/SKILL.md
Architects which metrics to collect, how logs should be formatted, and how distributed tracing should be implemented across boundaries.
npx skillsauth add fatih-developer/fth-skills api-observability-plannerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill ensures that when an API goes down, the team knows exactly why before the users even notice. It shifts telemetry from "just log the errors" to a structured observability pipeline.
Core assumption: If you can't measure it, you can't manage it. Blind APIs cause prolonged outages.
Define exactly what your framework will emit:
"User 123 failed login" vs {"event": "login_failed", "user_id": 123, "reason": "bad_password"}).W3C Trace Context). Ensure trace_id and span_id propagate across microservices and database calls.Define what constitutes "Healthy" and when pagers should go off.
/healthz shouldn't just return 200 OK. It should verify DB connection, Redis reachability, and critical downstreams.Required Outputs (Must write BOTH to docs/api-report/):
docs/api-report/api-observability-report.md)### 🔭 API Observability Blueprint
**Instrumentation Strategy:** OpenTelemetry (OTel)
**Log Format:** Structured JSON
#### 📊 Core Metrics (RED Method)
1. **Rate:** Tracked via Prometheus `http_requests_total`.
2. **Errors:** Alerting on HTTP 500-599. (4xx are client problems, track but don't wake up on-call).
3. **Duration:** Tracked via `http_request_duration_seconds` (Buckets: 50ms, 100ms, 500ms, 1s, 5s).
#### 🚨 Alert Configuration (PagerDuty / Slack)
- **High Severity:** Order Creation 5xx Rate > 1% over 5m.
- **Low Severity:** Database Disk Space < 20%.
#### 🆔 Tracing Propagation
Inject `traceparent` and `tracestate` headers into all outgoing upstream HTTP/gRPC requests.
docs/api-report/api-observability-output.json){
"skill": "api-observability-planner",
"framework": "OpenTelemetry",
"metrics": {
"latency_thresholds_ms": {"p95": 200, "p99": 500}
},
"alerts": [
{"name": "High 5xx Rate", "condition": "error_rate > 1%", "duration": "5m", "severity": "High"}
]
}
passwords, tokens, credit_cards, and emails must be masked or scrubbed before being written to stdout or log aggregators.tools
Create, optimize, critique, and programmatically structure prompts for AI systems. Use this skill whenever the user is designing or improving a static prompt, system prompt, coding prompt, agent prompt, workflow prompt, MCP-oriented prompt package, or an algorithmic prompt optimization pipeline. Also use it when the user asks to turn vague AI behavior into a precise instruction set, tool policy, agent spec, evaluation metric, or prompt architecture.
testing
Assumption-first architecture review skill to stress-test project plans and expose hidden risks.
testing
Enforce and manage DESIGN.md specifications, extract design systems from URLs, and combine design reasoning with token roles to prevent drift.
testing
Forces the agent to act with a Claude-like product mindset, prioritizing user journey, UX states, and visual quality before coding.