skills/authoring-log-alerts/SKILL.md
Author useful, low-noise log alerts on services in a PostHog project. Use when the user asks to set up alerts for their logs, suggest alerts they should add, or evaluate whether a service is worth monitoring. Covers service triage, baseline characterisation, threshold drafting, back-testing via simulate, and shipping with a notification destination.
npx skillsauth add posthog/ai-plugin authoring-log-alertsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Authoring an alert is a measurement problem, not a guessing problem. You are not trying to be exhaustive — you are trying to land thresholds that fire 0–3 times per week on real production patterns, on services that matter.
posthog:logs-alerts-events-list to inspect
fire/resolve cadence and posthog:logs-alerts-partial-update to adjust).posthog:query-logs, don't author an alert mid-incident.| Tool | Job | Where it fits |
| --------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ------------------ |
| posthog:logs-services | Top-25 services in window with log_count, error_count, error_rate, sparkline. | Step 1 — triage. |
| posthog:logs-attributes-list / posthog:logs-attribute-values-list | Discover keys/values for narrower filters. | Step 2, optional. |
| posthog:logs-count-ranges | Adaptive time-bucketed counts for a filter. | Step 3 — baseline. |
| posthog:logs-alerts-simulate-create | Replay a draft config against -7d history with full state machine. | Step 4 — validate. |
| posthog:logs-alerts-create | Persist the alert. | Step 5 — ship. |
| posthog:logs-alerts-destinations-create | Wire the alert to Slack or webhook. | Step 5 — ship. |
Do not call posthog:query-logs during authoring. You need distributions, not rows. Reserve posthog:query-logs for
the very end if the user asks "show me a sample of what would have fired" — limit: 10 is plenty.
Call posthog:logs-services for the last 24h with no filters. The response is capped at 25 services and includes a
sparkline, so it is small and bounded.
A service is a candidate when both are true:
log_count is non-trivial (≥ ~1k in 24h — quieter services produce too little signal to alert on).error_rate is non-zero, or the user has named the service explicitly.Skip services with high volume but error_rate == 0 unless the user wants a volume-shape alert (e.g. "warn me
if api-gateway suddenly stops producing logs"). Volume-floor alerts use threshold_operator: below and need
different reasoning — see references/volume-floor-alerts.md.
If the user names a service, treat it as a candidate even without error signal.
If a service has many error sub-types, an alert on "all errors" is usually too broad. Use
posthog:logs-attributes-list (try attribute_type: log) and posthog:logs-attribute-values-list to find a discriminator —
common ones are http.status_code, error.type, k8s.container.name. Add the narrowing filter to your draft.
Keep it simple: one severity filter + one or two attribute filters is plenty. Multi-clause filters are harder to reason about and rarely improve precision.
Call posthog:logs-count-ranges with the candidate's filters, dateRange: { date_from: "-7d" }, and
targetBuckets: 24 (one bucket ≈ 7h). The response gives you bucket counts.
Do not eyeball the percentiles or scale the threshold to the alert window manually. Pipe the count-ranges response into the helper script:
echo '<count-ranges JSON>' | python3 scripts/baseline_stats.py --window-minutes 5
The script returns:
{
"n_buckets": 12,
"bucket_minutes": 420.0,
"alert_window_minutes": 5,
"stats": { "p50": 12.0, "p95": 71.25, "p99": 126.25, "max": 140 },
"suggested_threshold_count": 5,
"rationale": "max(p99=126.25, median*3=36.0, floor=5) scaled from 420m bucket to 5m window",
"health": []
}
Use suggested_threshold_count as your starting threshold. Read health:
| health flag | What it means | What to do |
| ----------------------- | -------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| sparse:N_of_M_buckets | Too few non-empty buckets for a 7d baseline. | Widen filter, extend to -30d, or skip. |
| empty | All buckets are zero. | Skip — no signal. |
| spiky | max is 10×+ p95. | Count-threshold alerts work well. Proceed. |
| flat | p95 ≈ p50. | Be cautious — either no incidents in lookback, or the metric is too smooth. Try a longer lookback or skip. |
| [] (empty) | Healthy distribution. | Proceed. |
Pick a starter draft from these defaults — see references/threshold-defaults.md for the reasoning:
| Setting | Default | Notes |
| --------------------- | ------------------------------------------- | --------------------------------------------------------------------- |
| threshold_count | suggested_threshold_count from the script | Already scaled to the alert window. |
| threshold_operator | above | Use below only for volume-floor alerts. |
| window_minutes | 5 | Allowed: 5, 10, 15, 30, 60. Must match what you passed to the script. |
| evaluation_periods | 3 | M in N-of-M. |
| datapoints_to_alarm | 2 | N in N-of-M. 2-of-3 reduces flap from a single noisy bucket. |
| cooldown_minutes | 30 | Minimum time between repeat fires. |
Call posthog:logs-alerts-simulate-create with these settings and date_from: "-7d". The response gives you fire_count
and resolve_count.
Target: fire_count between 0 and ~3 over -7d. If outside the band:
| Outcome | Adjustment |
| --------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| fire_count = 0 over 7d and the baseline was spiky | Lower threshold_count toward stats.p95 from the script, or drop to 1-of-2. |
| fire_count = 0 and the baseline was flat | The service has no alertable signal. Skip it; log why. |
| fire_count > 5 | Raise threshold_count toward stats.max from the script, or move to 3-of-5 for a smoother window. |
| fire_count is fine but resolve_count never matches fire_count | Cooldown is too long, or the underlying state is genuinely sticky. Acceptable for now. |
When adjusting the threshold, read values from the script's stats block — never recompute percentiles
by hand.
Cap iteration at 3 simulate calls per candidate. If you can't land in the band in 3 rounds, the metric is wrong — either the filter is too broad, the window is wrong, or the service genuinely doesn't have a threshold-shape signal. Note it and move on.
Once a draft simulates cleanly:
posthog:logs-alerts-create with the validated config. Use a name like <service> error rate (auto) so the
user can see at a glance which alerts came from this skill.posthog:logs-alerts-destinations-create to wire it to a notification target. An alert with no destination
is silent. Always confirm the channel name or webhook URL with the user before attaching — never wire
an auto-generated alert to a production channel without explicit confirmation. If the user is unsure,
suggest a low-traffic testing channel for the first few alerts.If the user wants alerts created in enabled: false state for review-then-flip, pass enabled: false to
-create and tell them how many drafts you produced.
The filters field on posthog:logs-alerts-create takes a subset of LogsViewerFilters and must contain at
least one of:
severityLevels — list of ["trace","debug","info","warn","error","fatal"]serviceNames — list of service name stringsfilterGroup — property filter groupThe same shape goes into posthog:logs-alerts-simulate-create's filters field. Match the simulate filters to the alert filters
exactly — otherwise the simulation is testing a different alert than the one you ship.
Example minimum:
{
"severityLevels": ["error", "fatal"],
"serviceNames": ["api-gateway"]
}
posthog:logs-services call at the start, not per-candidate.posthog:logs-count-ranges call per candidate at targetBuckets: 24. Don't go above 30 during authoring.posthog:logs-alerts-simulate-create calls per candidate.posthog:query-logs calls during the authoring loop.Report what you did, in this shape:
The user should be able to read this and decide whether to disable any drafts before they go live.
tools
Focused Signals scout for PostHog projects with web traffic. Watches the acquisition and site-health layer the web analytics product reports on: per-channel session volume diverging from the site's own rhythm (an acquisition source silently collapsing or surging), attribution breakage (paid/campaign traffic reclassifying into Direct or Unknown when tagging breaks), landing pages that break (bounce-rate steps, 404 spikes, entry-path cliffs), and page-performance regressions (web vitals p75 steps). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.
tools
Focused Signals scout for PostHog projects using session replay. Watches two promises the replay product makes: that sessions are actually being recorded (capture integrity — recording volume vanishing while site traffic doesn't), and that the friction evidence inside recordings gets seen (rage-click / dead-click clusters concentrating on a page or element, error-after-interaction cohorts, recurring replay vision themes nobody aggregates). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.
tools
Focused Signals scout for PostHog setup health. Reads the project's active health issues — the deterministic findings of PostHog's own health checks (no live events, outdated SDKs, missing reverse proxy, absent web vitals, ingestion warnings, failing data-warehouse models, and more) — and decides which are genuinely worth surfacing. Unlike a one-signal-per-issue push, it bundles kind-clusters into a single finding, weights by real blast radius (cross-referencing actual event volume and reach), and prioritizes issues an agent can resolve via the MCP. Emits only above the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.
tools
Focused Signals scout for PostHog projects using feature flags. Watches the flag roster and the `$feature_flag_called` evaluation stream for contradictions between a flag's configured state and its real traffic: evaluation cliffs on healthy flags, ghost flags (code calling keys that no longer exist), response-distribution shifts with no corresponding flag edit, and flag debt (stale, fully-rolled-out, or dead flags still burning evaluations). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.