skills/signals-scout-surveys/SKILL.md
Focused Signals scout for PostHog projects running surveys. Watches active surveys for score regressions (NPS / CSAT / rating drops), response-volume drops, abandonment spikes, and targeting drift, AND aggregates open-text responses into recurring themes the team should know about (clusters of complaints, praise, feature requests). Emits findings only when a theme or anomaly clears the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
npx skillsauth add posthog/ai-plugin signals-scout-surveysInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a focused surveys scout. Your job has two halves and they're equally important:
survey dismissed rising as share of
survey shown), and targeting drift (impressions far above or below baseline).Surveys are direct user voice. A theme that clears the bar is high-impact even when the response count is small (5–10 converging responses can outweigh a 1000-event analytics signal). Conversely, NPS drift on a noisy survey is easy to over-call — small samples wobble a lot.
When in doubt, write a memory entry instead of emitting. Surveys are personal data; the panic radius for a wrong "users hate feature X" finding is high.
If surveys-get-all (with archived: false) returns an empty list and
surveys-global-stats shows zero events in the last 30 days, surveys aren't active on
this project. Write one scratchpad entry:
not-in-use:surveys:team{team_id}Close out empty. Future surveys runs read this entry cold and short-circuit fast. Re-running with the same key idempotently refreshes the timestamp — the entry stays until surveys actually become active, at which point the next run rewrites or deletes it.
Cycle between these moves; skip what's not useful.
Three cheap reads cold-start a run:
signals-scout-scratchpad-search (text=survey or text=nps) — durable team steering.
Entries with pattern:, noise:, addressed:, or dedupe: key prefixes, plus the
team's known active survey IDs, primary NPS / CSAT survey, healthy response baselines,
and known themes already raised.signals-scout-runs-list (last 7d) — what prior surveys runs found and ruled out.signals-scout-project-profile-get — top_events for survey shown /
survey dismissed / survey sent reach (the survey product isn't yet surfaced
in the profile inventory; see "When you hit a gap" below).Then orient on surveys specifically. Order matters — busy projects can have 100+
active surveys, and surveys-get-all is never the right cold-start move there.
Each survey object is 30–50 KB (questions, internal targeting flag, appearance
theme, creator metadata) and even limit: 5 returns ~30 KB. Listing the lot blows
the token budget before you've made a single decision.
Right order:
surveys-global-stats (last 30d) — cheap project-wide check: are surveys
converting at all? If survey sent total is zero, close out empty.
Rank candidates by recent activity, not by config. Use execute-sql to find
the top survey ids by survey sent volume in the last 30d:
SELECT
JSONExtractString(properties, '$survey_id') AS survey_id,
count() AS sent_count,
max(timestamp) AS last_sent
FROM events
WHERE event = 'survey sent'
AND timestamp > now() - INTERVAL 30 DAY
GROUP BY survey_id
ORDER BY sent_count DESC
LIMIT 20
survey-get {id} on the top 5–10 ids only — full config when you actually
need to read questions / targeting / iteration / type. Never surveys-get-all
on a project where step 2 returns more than ~20 distinct ids.
survey-stats {id} per candidate for shown / dismissed / sent counts.
Use surveys-get-all {"limit": 5} only as a last resort when discovering a survey
by name, and prefer surveys-get-all {"search": "..."} over a blind page walk.
| Pattern | What it usually means |
| ----------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
| survey-stats shows dismissed / shown ratio sharply above the trailing baseline | Targeting / fatigue regression — the survey is wearing out |
| survey-stats shows sent / shown (response rate) cratering on a previously-converting survey | Question changed, UX regression, or audience shift |
| Open-text responses cluster around a single recent product change | Highest-value finding — qualitative confirmation of a user impact |
| Rating score drops materially against the survey's own trailing baseline | Emit-worthy if the drop clears the tiered bar (see Score regression section) |
| Survey running > 90 days with steadily declining responses | Stale survey — recommendation to retire / refresh, not an anomaly |
| survey shown count diverges sharply from prior baseline (up or down) | Targeting drift — feature flag / cohort condition changed upstream |
| Recent activity-log entries near the inflection point of a score drop | Connect the qualitative to a deploy — emit with timing as evidence |
Patterns to watch — starting points, not a checklist.
Surveys with rating questions (NPS 0–10, CSAT 1–5, single rating) are the cleanest
quantitative signal. For each rating-style active survey, pull the last 30 days of
survey sent events and compute the score trend.
Resolving the response value — coalesce both key schemes. PostHog writes each
answer under two property keys and the product reads them with a coalesce
(getSurveyResponse() in frontend/src/scenes/surveys/utils.ts). Query the same way
or you will miss responses. Read survey-get for the question's id and its
position in the questions array:
$survey_response_<question_id> — the question's UUID.$survey_response for the first
question (index 0), $survey_response_<n> (numeric) for question index n.A survey whose responses are only index-based — common when the rating is the first
question, so the key is bare $survey_response — returns all-NULL under the id-based
key alone, which reads as "no responses." Always coalesce id-based over the
index-based fallback:
SELECT
toDate(timestamp) AS day,
avg(toFloat64OrNull(coalesce(
nullIf(JSONExtractString(properties, '$survey_response_<question_id>'), ''), -- id-based (modern)
nullIf(JSONExtractString(properties, '<index_based_key>'), '') -- '$survey_response' (index 0) or '$survey_response_<n>'
))) AS avg_score,
count() AS responses
FROM events
WHERE event = 'survey sent'
AND JSONExtractString(properties, '$survey_id') = '<survey_id>'
AND timestamp > now() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day
Always dedupe by $survey_submission_id for surveys collected after that property
shipped — the legacy path is one row per submission, but newer client versions can
emit multiple survey sent events per submission and you'll over-count rating
responses. Pattern (from products/surveys/backend/util.py):
-- Inside the WHERE clause
AND uuid IN (
SELECT argMax(uuid, timestamp) FROM events
WHERE event = 'survey sent'
AND JSONExtractString(properties, '$survey_id') = '<survey_id>'
AND timestamp > now() - INTERVAL 30 DAY
GROUP BY CASE
WHEN COALESCE(JSONExtractString(properties, '$survey_submission_id'), '') = ''
THEN toString(uuid)
ELSE JSONExtractString(properties, '$survey_submission_id')
END
)
What counts as "enough responses" depends on the survey's normal volume. Flagship NPS surveys can hit 100+/week; a feature-specific widget survey running at 15–25 responses/month is also normal. Use a tiered bar:
In all tiers, anchor on the survey's own trailing baseline before any global rule of thumb. A widget survey with a 6.0 trailing average that drops to 5.2 on N=12 is more interesting than a popover at NPS 32 → 31 on N=400 — and the scout's job is to spot the meaningful one.
survey-stats returns shown and sent counts. A survey that converted at 8% last
month and 0.5% this week is broken — usually because the question wording changed, the
target audience changed, or the survey is being shown in a different context (a flag
flipped, a page was redesigned). Pair the stats with survey-get to check the
updated_at and questions; if the survey config was edited near the inflection,
that's the cause. If not, suspect upstream.
Disqualifier: a survey at the end of its scheduled window naturally tails off. Check
schedule.end_date before treating low recent response rate as a regression.
survey shown events are impressions; survey dismissed are explicit close-outs;
survey sent are completions. Their meaning depends on the survey's type, and
the scout has to read type from survey-get before interpreting any ratio:
popover — survey shown fires when the popover auto-renders. A high
dismiss rate is genuine signal: users are seeing it and immediately killing it.widget — survey shown only fires when the user clicks the widget
trigger. A high dismiss rate means users opened the widget and changed their
mind, not that the team is spamming them. Baseline dismiss rates are naturally
higher (50–70% is common; the Logs Feedback widget on PostHog itself runs at
64% with healthy NPS) and shouldn't be flagged as fatigue.api — survey shown fires from SDK calls. Semantics depend on the
integrating product; check survey-get to see how it's wired before
interpreting trends.If the dismiss rate jumps sharply on a popover survey (e.g. baseline 30%, recent
70%), users are seeing it and immediately killing it. Common causes: the survey
now appears at a worse moment in the user journey, or fatigue from displaying too
often.
For widget and api surveys, treat dismiss-rate shifts as low signal unless
they're paired with a response-volume drop — that's when something upstream of
the click changed.
SELECT
toDate(timestamp) AS day,
countIf(event = 'survey shown') AS shown,
countIf(event = 'survey dismissed') AS dismissed,
countIf(event = 'survey sent') AS sent,
dismissed / nullIf(shown, 0) AS dismiss_rate
FROM events
WHERE event IN ('survey shown', 'survey dismissed', 'survey sent')
AND JSONExtractString(properties, '$survey_id') = '<survey_id>'
AND timestamp > now() - INTERVAL 30 DAY
GROUP BY day
ORDER BY day
Memory note when a dismiss rate is structurally high (e.g. an exit-intent survey naturally has high dismiss); don't re-flag every run.
This is the highest-value pattern — and the one with the highest false-positive risk. For each survey with at least one open-text question, pull recent responses and look for clustering.
SELECT
coalesce(
nullIf(JSONExtractString(properties, '$survey_response_<question_id>'), ''), -- id-based (modern)
nullIf(JSONExtractString(properties, '<index_based_key>'), '') -- '$survey_response' (index 0) or '$survey_response_<n>'
) AS response,
person_id,
timestamp
FROM events
WHERE event = 'survey sent'
AND JSONExtractString(properties, '$survey_id') = '<survey_id>'
AND timestamp > now() - INTERVAL 14 DAY
AND coalesce(
nullIf(JSONExtractString(properties, '$survey_response_<question_id>'), ''),
nullIf(JSONExtractString(properties, '<index_based_key>'), '')
) != ''
-- dedupe by submission as above
ORDER BY timestamp DESC
LIMIT 200
Read the responses. Look for:
Theme is emit-worthy when:
When you emit, quote 2–3 representative responses verbatim in the evidence (no PII; truncate at sentence level if a response is long). Name the theme as a concrete claim ("Users report the dashboard list is slow with > 10 dashboards"), not a vague summary ("Users have feedback about dashboards").
Don't emit when:
addressed already covers the same theme.survey shown count diverging sharply from baseline (up 5x or down 5x) usually
means an upstream targeting condition changed. Four sources to check via
survey-get:
linked_flag_id — survey shows only when this flag evaluates true. A flag
rollout change directly resizes the audience.targeting_flag_id — user-configured cohort / property targeting. Same
effect; also subject to cohort recomputation lag.linked_insight_id — survey gates on viewing a specific insight. If the
insight is deleted or its query is broken, the survey goes dead. Cross-check
with insight-get and inbox-reports-list for any insight-side issues.conditions — URL pattern, event-trigger, or repeatedActivation —
config changes here directly resize the trigger surface.If the upstream changed near the inflection, flag it as targeting drift, not a
survey regression. (Note: the auto-managed internal_targeting_flag is a
separate construct that suppresses already-responded / already-dismissed users —
not a targeting source the team controls, and changes to it are usually
expected.)
Memory-worthy unless the survey is load-bearing (e.g. NPS the team reports on publicly) — then emit so the team knows the sample frame changed.
A survey created > 90 days ago with steadily declining response volume and no
updated_at activity is probably forgotten. P3 recommendation, not an anomaly:
suggest the team retire it, refresh the question, or rotate the audience. Don't
re-emit if a memory entry already flagged it.
When a theme emerges, cross-check activity-log-list for the period around the
inflection. If a deploy / flag flip / feature change in the same week matches the
theme content, the finding lands much harder ("4 users complained about checkout
slowness on $date; deploy of checkout-rewrite-v2 flag rolled to 100% on
$date-1"). Timing is hint, not proof — say "matches" rather than "caused by".
Recurring surveys (schedule: recurring, iteration_count > 1,
iteration_frequency_days > 0) cycle iterations every N days, and each
iteration's responses are tagged with $survey_iteration. Comparing themes
across iterations on the same survey is itself a signal:
Filter open-text and rating queries by $survey_iteration to compare cleanly:
AND JSONExtractString(properties, '$survey_iteration') = '<n>'
When emitting on a recurring survey, name the iteration explicitly in the
evidence ("iteration 3 of nps-q1-2026, last 14d") so the team reads it against
the right baseline.
Memory is a continuous activity. Write a scratchpad entry whenever you observe something
a future surveys run should know. Encode the "category" in the key prefix — pattern:,
noise:, addressed:, dedupe: — so future runs find it with a single text= search:
pattern:surveys:active-inventory — "Active surveys: nps-q1-2026 (id abc,
NPS 0–10), feedback-modal (id def, open text), csat-after-purchase (id ghi,
1–5 rating)."pattern:surveys:nps-q1-2026 — "Primary NPS survey is nps-q1-2026; healthy
baseline 32 ± 5 over last 90 days, ~120 responses/week. Score < 25 or responses
< 60/week is the alert bar."noise:surveys:feedback-modal — "feedback-modal exit-intent survey naturally
has 70% dismiss rate — that's expected behavior for this trigger, not a regression."addressed:surveys:theme-checkout-step-2-2026-05-04 — "Theme
checkout-step-2-confusion raised in run on 2026-04-30; team acknowledged, fix shipped
2026-05-04. Don't re-emit unless theme reappears post-2026-05-04."addressed:surveys:csat-old-stale — "Survey csat-old last got responses
2026-02; appears abandoned but the team still has it active. P3 recommendation already
filed; don't re-recommend."By run #5 you'll know the team's active surveys, healthy response volumes, score baselines, which dismiss rates are structural, and which themes have already been raised — so when a real theme or regression appears, the finding lands with the right context already attached.
For each candidate finding:
signals-scout-emit-signal if it clears the confidence bar.
Strong scout findings: weight ≥ 0.7, confidence ≥ 0.85, with concrete survey ids,
question ids, response counts, score deltas, and (for themes) 2–3 verbatim quotes
in the evidence. Sample-size matters here more than other domains — a finding on
10 responses needs to be tighter than one on 200.noise: or addressed:
key prefix already covers it.Cross-check inbox-reports-list before emitting — if the same theme is already in the
inbox from a prior run or another source, refresh the scratchpad rather than re-emit.
Summarize the run — one paragraph: which surveys, what themes / anomalies you found,
what you emitted, what you remembered, what you ruled out. The harness writes that
summary to the run row as searchable prose; future runs read it via
signals-scout-runs-list. Do not write a separate "run metadata" scratchpad entry —
the run summary already serves that role.
schedule.end_date before flagging.addressed: scratchpad entry — the team already saw it and
acted; re-emitting wastes inbox space.TEST, TEST FEEDBACK DELETE!,
qwe, asdf, single-character submissions, repeated submissions from the
survey author or the host org's own users. These are endemic on real projects
and will skew theme counts if you don't strip them. A WHERE length(response) > 5 AND lower(response) NOT IN ('test', 'qwe', 'asdf')
guard plus an email NOT LIKE '%@<host_org_domain>%' person-property filter
catches most of it.archived / status / start_date before treating zero responses as a regression.When in doubt, write a memory entry instead of emitting.
Direct calls (read-only):
surveys-global-stats — project-wide aggregate. Start here every cold
start; cheap sanity check on overall survey health before any per-survey work.survey-stats — per-survey response statistics: shown / dismissed / sent
counts, unique respondents, conversion rates, timing. Date-filterable.survey-get — full survey config for a candidate: questions (with ids and
types), type (popover / widget / api — affects how survey shown semantics
read), targeting (linked_flag_id / targeting_flag_id / linked_insight_id
/ conditions), schedule (start_date, end_date), iteration config,
updated_at. Read this before drawing conclusions about score changes —
question wording changes invalidate trend comparisons.surveys-get-all — last-resort discovery. Each survey object is 30–50 KB and
busy projects have 100+ active surveys; calling this with limit > 5 will
blow your token budget. Prefer surveys-global-stats + an execute-sql
ranking query (see "Get oriented" above) to find the candidate set, then
survey-get per id. Use surveys-get-all {"search": "..."} if you need to
resolve a name from a memory entry.execute-sql against events — for raw response analysis (rating trends, theme
aggregation, dedupe by $survey_submission_id). The properties to extract:
$survey_id — which survey$survey_iteration — which iteration of a recurring survey$survey_submission_id — dedupe key (newer events; older events lack this)$survey_response — first question's response, index-based legacy key (index 0)$survey_response_<n> — index-based key for question index n > 0 (numeric suffix)$survey_response_<question_id> — id-based per-question key (question UUID; preferred,
but coalesce over the index-based keys above — see "Resolving the response value")$survey_completed, $survey_partially_completed, $survey_dismissed — status$survey_responded — whether the user responded at allread-data-schema event_property_values — sample response values to confirm
property keys exist and have the shape you expect before running heavy aggregations.query-trends — confirm survey shown / survey sent volume trends with weekly
comparisons. Cheaper than a full SQL aggregation when you just need the shape.activity-log-list — correlate themes / score drops with recent product changes.Harness-level:
signals-scout-project-profile-get / signals-scout-scratchpad-search /
signals-scout-runs-list / signals-scout-runs-retrieve — orientation + dedupe.signals-scout-emit-signal / signals-scout-scratchpad-remember — emit / remember.Two MCP gaps are known and may be worth flagging in a separate PR rather than working around in-skill:
surveys-get-all directly. Adding a _surveys builder to
products/signals/backend/scout_harness/profile/builders.py (a few rows: active
count, top surveys by recent volume, primary NPS / CSAT survey if any) would let
every scout — not just this one — see surveys at orientation time. Worth a P3.products/surveys/backend/summarization/ but it's not exposed as an
MCP tool. If it were, this scout could lean on cached summaries instead of
re-aggregating themes from scratch each run. Worth a P2 for accuracy and cost.If you notice a third gap during a run that would meaningfully unlock this scout,
write a scratchpad entry with key mcp-gap:surveys:<short-name> so the gap surfaces in
the next review via text=mcp-gap.
not-in-use: scratchpad entry).noise: / addressed: / dedupe: key
prefix → skip."Looked but found nothing meaningful" is a real outcome.
development
Focused Signals scout for PostHog projects using revenue analytics. Watches the derived revenue product for upstream failures (Stripe sync stalls, capture regressions), config drift (missing subscription property, currency mix surprises, broken Stripe↔person joins, deferred-revenue gaps), and goal-miss escalations. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for finding observability gaps in PostHog itself — significant event volumes the team isn't tracking, custom events with no insight or dashboard coverage, insights pointing at events that have stopped firing, dashboards missing related context, critical events with no alerts. Watches the event-stream-vs-saved- inventory delta as the team's product evolves and emits findings recommending new insights, dashboard additions, or alerts when gaps clear the confidence bar. Self-contained peer in the signals-scout-* fleet — picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using logs. Watches for volume bursts, severity-distribution shifts, service silence, fresh message patterns, and trace-correlated bursts via the logs ingestion pipeline. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.
testing
Focused Signals scout for PostHog projects using LLM analytics. Watches `$ai_generation`, `$ai_evaluation`, `$ai_trace` and related events for cost spikes, latency drift, eval pass-rate drops, runaway loops, and error rates. Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills. Picked uniformly at random by the coordinator alongside `signals-scout-general` and other specialists.