skills/signals-scout-observability-gaps/SKILL.md
Focused Signals scout for finding observability gaps in PostHog itself — significant event volumes the team isn't tracking, custom events with no insight or dashboard coverage, insights pointing at events that have stopped firing, dashboards missing related context, critical events with no alerts. Watches the event-stream-vs-saved- inventory delta as the team's product evolves and emits findings recommending new insights, dashboard additions, or alerts when gaps clear the confidence bar. Self-contained peer in the signals-scout-* fleet — picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
npx skillsauth add posthog/ai-plugin signals-scout-observability-gapsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a focused observability-gaps scout. Spot meaningful gaps between what events this team is producing and what they have set up to observe — and emit findings that recommend new insights, dashboard additions, or alerts when a gap clears the confidence bar. An empty findings list is a real outcome; recommending things the team already has, or recommending coverage for noise events, is worse than recommending nothing.
The shape of this scout is different from the other specialists: the findings are recommendations, not problems. The confidence bar is correspondingly higher — a noisy "you should track X" stream destroys the inbox's signal-to-noise ratio. Prefer fewer, well-evidenced recommendations.
If top_events in the project profile is null or shows fewer than ~5 events firing
above 100/day, the project is too quiet for observability-gap analysis to surface real
recommendations. Write one scratchpad entry:
not-applicable:observability_gaps:team{team_id}Close out empty. Future observability-gaps runs read this entry cold and short-circuit in seconds. Re-running with the same key idempotently refreshes the timestamp — the entry stays until the team grows into meaningful volume, at which point the next run rewrites or deletes it.
Cycle between these moves; skip what's not useful, revisit what is.
Three cheap reads cold-start a run:
signals-scout-scratchpad-search (text=gap or text=observability) — durable team
steering inherited from past observability runs. Entries with pattern:, noise:,
addressed:, or dedupe: key prefixes tell you what's normal, what's already
surfaced, what to skip. Critical here because the same gap should never be re-emitted
across runs.signals-scout-runs-list (last 14d) — what prior observability-gap scouts found and
what was ruled out. Skim summaries; pull signals-scout-runs-retrieve only when a
summary mentions a recommendation you're considering.signals-scout-project-profile-get — top_events for volume + reach, popular_insights
for what's already saved, recent_dashboards for the dashboards in active use. This
one read tells you most of what you need to detect gaps.Six families of gap, ordered by typical signal density. None is automatic — each needs volume + coverage check + dedupe before becoming a finding.
Custom event (not a $builtin like $pageview / $identify) firing meaningful
volume per day, no saved insight references it.
Direct calls:
read-data-schema events — surface event names + 24h volumes.execute-sql against system.insights — find insights mentioning the event name in
name, description, or query JSON. Pattern: query::text ILIKE '%{event_name}%'.event-definitions-list for last_seen_at recency and the verified flag —
the team flagged it as worth tracking.Strong signal: event > 1000/day, no insight, verified=true. Weak signal: event
< 100/day, untyped, sporadic.
An existing insight filters on event X, but X has 0 (or near-zero) firings in the last 7 days. Often a sign of:
signed_up → sign_up_completed) and the insight wasn't updated.Direct calls:
execute-sql over system.insights to extract the events series each insight
filters on.query-trends to measure recent volume of those events.event-definitions-list for similar names suggesting
a rename (Levenshtein-close, same prefix, same property shape).Strong signal: insight has been viewed in the last 30d AND its primary event has 0 firings in 7d AND a similar-named event is firing > 100/day.
Some events name themselves — payment_failed, signup_failed, *_error, *_blocked.
If they fire at all and no alert exists, that's a gap. Use the project's own
patterns: search the event vocabulary for terms like failed, error, blocked,
denied, rejected, timeout, crashed.
Direct calls:
read-data-schema events filtered by name pattern (failed, error, etc).alerts-list — what alerts exist and what they target.query-trends to confirm volume is non-trivial (not just one-off).Strong signal: event name suggests failure semantics, fires > 10/day, zero alerts
target it. Weak signal: name has error but the event is benign developer telemetry.
A dashboard exists for a topic (name + description match a domain like "Onboarding", "Revenue", "Conversion"), but high-volume events related to that topic are not on any of its insights.
Direct calls:
dashboards-get-all — current dashboards + tags + descriptions.system.insights WHERE id IN (dashboard.insight_ids).Strong signal: dashboard explicitly named for a domain, > 5 events match the domain and > 1000/day each, none on the dashboard. Weak signal: arbitrary keyword overlap.
Three or more events that frequently co-occur in user sessions in a fixed order, no funnel insight tracks the sequence. Usually an onboarding flow, signup flow, checkout flow, etc.
Direct calls:
query-paths (one call) on top distinct events to surface common sequences.execute-sql against system.insights WHERE filters::text ILIKE '%FunnelsQuery%'
to find existing funnels.Strong signal: 3-step sequence with > 1000 users completing step 1, > 50% reaching step 2, no existing funnel covering the sequence. Confidence threshold is high here because funnels are subjective — a common sequence isn't always a meaningful funnel.
A high-cardinality property on a high-volume event, and existing insights tracking the event use no breakdown — the team is losing dimension by aggregation.
Direct calls:
read-data-schema event_property_values — see distinct values for a property.execute-sql over system.insights for the event — extract breakdownFilter shape.Strong signal: property has 5-50 distinct values (not unbounded), event > 5000/day, no insight breaks down by it. Weak signal: property has 1000+ distinct values (would explode the chart) or ≤ 2 values (no information added).
A finding here recommends an action, not surfaces a problem. Required elements:
Severity for observability-gap findings is almost always P3 (suggestion). The confidence bar trades off:
popular_insights and existing_inbox_reports
before emitting. If a previous run already recommended this gap, don't re-emit.Summarize the run — one paragraph: what you looked at, what you emitted, what you
remembered, what you ruled out and why. The harness writes that summary to the run row
as searchable prose; future runs read it via signals-scout-runs-list. Do not write
a separate "run metadata" scratchpad entry — the run summary already serves that role.
$pageview, $autocapture, $identify,
$set, $opt_in, $groupidentify, $feature_flag_called are surfaced through
PostHog's product views (Web Analytics, Feature Flags) without needing a custom
insight. Don't recommend creating one.noise:observability_gaps:internal-distinct-ids
scratchpad entry for known internal distinct_ids and skip them in volume counts.popular_insights viewer-count threshold.When in doubt, write a scratchpad entry instead of emitting. Recommendations have a high panic radius for whoever owns the observability surface — false positives erode trust fast.
Direct calls (read-only):
read-data-schema — kind=events for volumes, kind=event_properties /
event_property_values for cardinality and breakdowns.query-trends — confirm recent-window volume + reach numbers cited in evidence.query-paths — sequence detection for funnel candidates.insights-list — paginated insight catalog (use sparingly; SQL is faster).dashboards-get-all — active dashboards + tags.event-definitions-list — event-definition metadata: verified flag, last_seen_at,
created_at, custom-vs-builtin marker.alerts-list — existing alert configurations and what events they target.execute-sql over system.insights / system.dashboards / system.cohorts —
the fast path for "does an insight reference event X?" type queries.Harness-level:
signals-scout-project-profile-get — cold orientation snapshot. Has top_events,
popular_insights[13], recent_dashboards, existing_inbox_reports already.signals-scout-scratchpad-search / signals-scout-scratchpad-remember — durable steering.signals-scout-runs-list / signals-scout-runs-retrieve — what prior runs found.signals-scout-emit-signal — emit a recommendation finding.For deeper investigation playbooks, the sandbox image bakes upstream PostHog skills:
posthog:querying-posthog-data (HogQL syntax + system.* search patterns) and
posthog:exploring-autocapture-events (custom-event vs autocapture distinctions, when
each lens applies).
addressed: (recommendation actioned) or
noise: (recommended but ignored) key prefix → skip with a one-line note."Looked but found nothing meaningful" is a real outcome, not a failure. Every recommendation that doesn't ship is one fewer false positive eroding the inbox.
testing
Focused Signals scout for PostHog projects running surveys. Watches active surveys for score regressions (NPS / CSAT / rating drops), response-volume drops, abandonment spikes, and targeting drift, AND aggregates open-text responses into recurring themes the team should know about (clusters of complaints, praise, feature requests). Emits findings only when a theme or anomaly clears the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
development
Focused Signals scout for PostHog projects using revenue analytics. Watches the derived revenue product for upstream failures (Stripe sync stalls, capture regressions), config drift (missing subscription property, currency mix surprises, broken Stripe↔person joins, deferred-revenue gaps), and goal-miss escalations. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using logs. Watches for volume bursts, severity-distribution shifts, service silence, fresh message patterns, and trace-correlated bursts via the logs ingestion pipeline. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using LLM analytics. Watches `$ai_generation`, `$ai_evaluation`, `$ai_trace` and related events for cost spikes, latency drift, eval pass-rate drops, runaway loops, and error rates. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.