plugins/backend-toolkit/skills/performance-profiling/SKILL.md
Find and fix backend bottlenecks — connection pooling, p95/p99 latency, load testing with SLO-aligned thresholds (k6), and CPU profiling (flame graphs). Use when latency is high, throughput plateaus, or before scaling traffic. Not for DB query specifics (use query-optimization) or read-load shedding (use caching-strategy).
npx skillsauth add jaykim88/claude-ai-engineering performance-profilingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Locate the actual backend bottleneck with data (not intuition), fix it, and prove the fix with load tests gated on tail latency — so the system scales predictably.
Universal — measure-first profiling, p95/p99 over averages, connection pooling, and SLO-gated load testing are backend-perf principles; the profiler/load tool differs.
Measure before optimizing — find the real bottleneck
observability-setup) to see WHERE time goes: DB? external call? CPU? lock contention?query-optimization), not app CPU — confirm before profiling app codeRight-size connection pooling
Load test with SLO-aligned thresholds
p(95) / p(99) latency + error-rate thresholds IN the testCPU profiling when app code is the bottleneck
background-jobs) or optimize the hot function4b. Memory profiling — leaks and unbounded growth
node --heapsnapshot-signal)4c. Event-loop lag as a first-class signal (Node-specific, conceptually general)
eventLoopUtilization() / monitorEventLoopDelay and alert at thresholdAddress tail latency causes
query-optimization), lock contention (→ transaction-management)Validate (validation loop)
| ❌ Anti-pattern | ✅ Correct |
|---|---|
| Optimizing by guessing | Profile/measure first; fix the proven bottleneck |
| Tuning on average latency | Gate on p95/p99 tail latency |
| Unbounded / "bigger" connection pool | Right-sized pool + transaction-mode pooler |
| CPU-heavy work on the request thread (blocks event loop) | Offload to a worker / background job |
| Load test = flat hammer | Realistic ramp/sustained/spike scenarios |
| "Slow over time" treated as CPU when it's a leak | Heap snapshot diff between two timepoints; track RSS trend |
| Event-loop lag unmonitored (silent stalls) | Track eventLoopUtilization / GC pause % with alert thresholds |
| Tier | Examples | Action SLA | |---|---|---| | Critical | Connection pool exhaustion causing outages; p99 > SLO by multiples; event loop blocked by sync CPU work | Block release; fix immediately | | Major | No load test before a traffic event; pool size guessed; tail latency unmeasured | Fix this sprint | | Minor | Load test not in CI; minor GC tuning opportunity | Schedule within 2 sprints |
docs/perf-profile-YYYY-MM-DD.md — bottleneck, fix, before/after p95/p99perf(backend): right-size connection pool / perf(backend): offload <work> to workerthresholds: { http_req_duration: ['p(95)<300','p(99)<800'] }; CI fails on breach (exit 99)node --prof / 0x for flame graphs (note: clinic.js is no longer actively maintained — use node --prof / 0x / k6)py-spy / cProfile flame graphs; gunicorn/uvicorn worker tuning; pgbouncerpprof (built-in, excellent); goroutine/heap profiles; database/sql SetMaxOpenConnsquery-optimization — DB query time is usually the top bottleneckcaching-strategy — cache after optimizing, to shed read loadobservability-setup — metrics tell you WHERE to profilep(95)/p(99) thresholds in the load test so a breach fails CI automatically — tail latency, not averages, defines API health.node --prof / 0x / k6 as the live toolchain.development
Design webhooks correctly on both sides — sending (HMAC signing, retries with backoff, at-least-once) and receiving (verify signature on raw body, enqueue + 200 fast, dedupe on event id). Use when adding webhook delivery or consuming a provider's webhooks. Not for internal service-to-service events (use async-messaging) or general outbound-call retry policy (use resilience-patterns).
testing
Use transactions and isolation levels correctly — keep them short, no network calls inside, explicit isolation, retry on serialization conflicts, and choose optimistic vs pessimistic locking. Use when a write spans multiple tables, when concurrent updates corrupt data, or when designing money/inventory flows. Not for cross-service event delivery (use async-messaging Outbox) or schema-level constraints (use schema-design).
development
Backend testing pyramid — unit for pure logic, integration against a real DB (Testcontainers), and consumer-driven contract testing (Pact) for service boundaries. Use before a feature, after a bug fix, or when services break each other on deploy. Not for load testing (use performance-profiling) or security testing (use backend-security-audit).
data-ai
Design a relational schema — normalize to 3NF then denormalize with justification, choose the right Postgres index type per data shape, enforce constraints at the DB. Use when modeling a new domain, when queries are slow, or before a migration. Not for diagnosing slow queries (use query-optimization) or shipping the change without downtime (use migration-strategy).