skills/exploring-signals-scouts/SKILL.md
How to explore and make sense of PostHog Signals scouts — the scheduled agents that scan a project and emit findings into the Signals inbox. Use when a user wants to understand what scouts they have, how each one is behaving, and whether the fleet is actually working. Covers surveying the fleet and its schedules, reading recent scout runs and drilling into a single run's reasoning, inspecting the durable scratchpad memory the fleet has built up, tracing a run to the findings it emitted, and assessing a scout's health and performance over time (cadence, success rate, emit rate, signal-to-noise). Read-only and exploratory — to write or tune a scout, use `authoring-signals-scouts` instead. Trigger on "what are my scouts doing", "how is my <x> scout performing", "show me recent scout runs", "why did this scout find/emit nothing", "what has the fleet learned", "explore scout run <id>", "is my scout working".
npx skillsauth add posthog/ai-plugin exploring-signals-scoutsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A scout is a scheduled agent that wakes on its own interval, looks at one PostHog project,
decides what's genuinely worth surfacing, and either emits it as a finding into the Signals
inbox or closes out empty (a real, valid outcome). PostHog ships a fleet of canonical scouts — a
cross-product generalist (signals-scout-general) plus per-surface specialists (error tracking,
logs, AI observability, experiments, feature flags, session replay, web analytics, surveys, and
more). A project may also have custom scouts beyond the canonical fleet — any
signals-scout-* skill a team authored (e.g. -brand-mentions, -mcp-feedback) shows up here
too, so don't assume a fixed roster: signals-scout-config-list is the authoritative roster for
a project.
This skill helps you understand and explore what a project's scouts are doing and how they're
performing — entirely through read-only MCP tools. It is the observability counterpart to
the authoring-signals-scouts skill (which teaches writing and tuning) and to the
inbox-exploration skill (which covers the inbox reports scouts feed into).
There are five things you can observe about the fleet, each with its own tool:
| What you want to know | Tool | What it tells you |
| -------------------------------------------- | ---------------------------------------- | ------------------------------------------------------------------------------- |
| Which scouts run, how often, in what posture | signals-scout-config-list | One row per scout: schedule, enabled, emit, last_run_at |
| What the scouts actually did, run by run | signals-scout-runs-list / -retrieve | Per-run status, timing, end-of-run summary, emitted_count, deep-link |
| What the fleet has learned across runs | signals-scout-scratchpad-search | Durable per-team memory (baselines, noise, allowlists) |
| What the scouts actually emitted | execute-sql over document_embeddings | The authoritative per-finding record (weight, severity, confidence) — see below |
| What the scouts surfaced to the user | inbox-reports-list | Findings that cleared the bar and became inbox reports |
The orienting sixth is signals-scout-project-profile-get — the deterministic snapshot of "what's
true about this project" that every scout cold-starts from. When a scout found nothing, this is
usually why.
Two of these tools — signals-scout-runs-list and especially
tasks-runs-session-logs-retrieve — routinely return payloads that overflow an MCP client's
token budget and get spilled to a file. This is the normal path, not an error. Plan for it up
front rather than discovering it after a failed call:
limit small on signals-scout-runs-list (~10–15). Each row carries a long prose
summary, and runs come back newest-first across the whole fleet, so even a modest page is
large.call --json (so the saved file is real JSON, not the pretty text format — jq-able)
and read the saved file with jq / a script rather than inline.scripts/ do the
reconstruction for you — see below.Don't assume the project has scouts. The fleet only runs on teams enrolled via the signals-scout
feature flag, and a project may have no configs, all-disabled scouts, or scouts stuck in dry-run.
Run this first whenever a user asks about their scouts for the first time in a session.
signals-scout-config-list
Read the result against three cases:
The config list is unpaginated — it comes back as { results: [...] } (a bare array), with no
count field. Read the result against three cases:
Empty (results: []) — no scouts are registered. The project isn't enrolled in the scout
fleet (or hasn't ticked yet). Say so plainly; don't go fishing for runs. Point the user at the
Signals scout settings / PostHog Code onboarding rather than inventing activity.
Configs exist but all enabled: false — the fleet is registered but paused. Nothing is
running. Tell the user which scouts exist and that they're all off.
At least one enabled: true — the fleet is registered and that scout is allowed to run. For
each enabled scout note its run_interval_minutes (cadence), emit (false = dry-run, runs
but writes nothing to the inbox), and last_run_at. One caveat before reporting "it's live": runs
are gated by the signals-scout feature flag, not by enabled. A project that was enrolled and
later drained from the flag keeps its enabled: true rows, but the coordinator no longer plans
runs for it — so a stale or null last_run_at on an enabled scout usually means the project is
no longer enrolled, not that the scout is idle.
last_run_at is a dispatch stamp, not proof a run executed. The coordinator advances it the
moment it enqueues a child workflow for a due scout — before any worker picks the run up. Child
dispatch is fire-and-forget, so if workers are saturated or down the children just queue and no
run ever materializes, yet last_run_at keeps marching forward each tick. So a recent
last_run_at means "dispatched this tick," not "a run is genuinely happening." The
authoritative liveness signal is the newest actual run row in signals-scout-runs-list, not
the config stamp. Cross-check them: if last_run_at is fresh (minutes ago) but no run row has
appeared for that scout in well over its run_interval_minutes, the fleet is dispatching but
not running — workers backed up / down, or runs stranded — a real reliability problem, not a
live scout. Don't report "it's running" off last_run_at alone.
A scout that is enabled: true but emit: false is the most common source of "my scout isn't
doing anything" confusion: it is running and reasoning every tick, it just isn't allowed to post
findings yet. Always surface the emit posture when reporting on a scout.
See references/scout-data-model.md for every field on a config,
run, and scratchpad entry, the run status values, and how the pieces link together.
"What scouts do I have / what are they doing?" — lead with config-list, then enrich with the
most recent run per scout so the user sees liveness, not just configuration.
signals-scout-config-list — the roster.signals-scout-runs-list and pick the newest run with a matching
skill_name (runs come back newest-first across the whole fleet, so a single call usually
covers everyone). Report status and how long ago it ran.Present it as a table the user can scan — scout, cadence, posture, last run, last outcome — and call out anything anomalous (never run, last run errored, stuck in dry-run for a long time).
"How does my error-tracking scout work / how is it doing?"
config-list for signals-scout-error-tracking:
schedule, posture, last run.posthog:llma-skill-get {"skill_name": "signals-scout-error-tracking"}
returns the team's actual instruction set (which may be a canonical default or a diverged,
hand-edited row). This is what the agent is told to do every run — its signal-vs-noise
discriminator, explore patterns, and disqualifiers. To understand why a scout behaves the
way it does, read its body.runs-list with text set to the skill's domain, or just scan the
newest runs and filter to its skill_name. The end-of-run summary on each run is the scout's
own account of what it looked at and decided.scratchpad-search (see below). The memory entries a scout wrote
reveal the baselines and noise it has internalized about this project.signals-scout-runs-list returns the most recent runs across the whole fleet, newest first
(capped at 100). Use it to answer "what happened lately?"
date_from / date_to (ISO-8601; inclusive lower, exclusive upper
on created_at). Walk backwards by passing an earlier date_to.text — a case-insensitive substring match on each run's end-of-run
summary. This is how the headless scout dedupes, and it's how you find "did any run already
look at the checkout error spike?"emitted — emitted=true returns only runs that surfaced at
least one finding, emitted=false only the quiet runs. This is the direct way to answer "which
runs actually emitted something?" without parsing prose.Each summary row carries run_id, skill_name, skill_version, status, started_at,
completed_at, emitted_count (how many findings the run emitted), emitted_finding_ids (their
ids), task_url (a deep-link into the Tasks UI for the full transcript), and the summary prose.
Lead with the summary when narrating to the user — it's the scout's own plain-language close-out —
and always offer the task_url for the full reasoning.
When the user wants the full story of one run (or pastes a run id / Tasks URL):
signals-scout-runs-retrieve
{ "id": "<uuid>" }
Note the field name flip: runs-list returns each run's id as run_id, but runs-retrieve
takes it as id. Pass the run_id value through as id.
Returns the full run: status, started_at / completed_at (compute duration from these),
skill_name / skill_version (what ran, at what body version), the end-of-run summary,
emitted_count / emitted_finding_ids, and task_url. The transcript — the actual tool calls and
reasoning — lives in the Tasks UI behind task_url, not in this payload; hand the user that link
when they want to see every step. A failed run returns an empty summary and no error
field — the payload looks the same as the list row, so to learn why it failed you need the
transcript.
You don't have to open the UI for that: tasks-runs-session-logs-retrieve returns the run's
session log (every tool call, message, and reasoning step) as data — handy when you're
diagnosing a failure or want to trace exactly what a run did without leaving the conversation. Pass
the run's task_run_id as id and its task_id (both are on the run row).
The raw stream is large (hundreds of KB to a few MB) and will overflow inline, so fetch it with
call --json and let it spill to a file, then run it through
scripts/render_run_report.py rather than parsing it by hand.
⚠️ Do not reach for exclude_types: "tool_call_update,…" to slim it down. It is tempting —
the stream is dominated by incremental tool_call_update chunks — but each tool's actual input
lives only in those chunks: the base tool_call event carries an empty rawInput, and the
streamed updates build the input (and the final rawOutput) token by token. Excluding them leaves
you with tool names but no idea what the scout actually queried. Fetch the full log and let
the script reassemble each call (it groups by toolCallId, keeps the richest rawInput, and
attaches the completion's rawOutput/status).
Whether a run emitted is a first-class field: emitted_count. emitted_count > 0 means the
run surfaced that many findings; emitted_count: 0 means it closed out empty. Don't parse the prose
summary for this any more — a phrase like "already emitted P1 … did not re-emit" describes a
prior run, so substring-matching the summary for "emitted" is unreliable, whereas emitted_count
is the authoritative tally. emitted_finding_ids lists the finding_ids behind that count, in emit
order; each maps to a Signal with source_id = run:<run_id>:finding:<finding_id>, giving a
reliable run → finding link. See references/scout-data-model.md
for the run-to-finding link and how an emitted finding rides through grouping into the
source_product: "signals_scout" inbox filter.
A run with status complete and an empty-handed summary ("surface at baseline, nothing to
emit") is a healthy outcome, not a failure — most runs should close out empty. Treat a stream
of empty close-outs as the fleet doing its job, not as the fleet being broken.
The scratchpad is the fleet's durable, per-team memory — prose entries scouts write so future runs get smarter and quieter. Reading it tells you what the fleet believes about this project.
signals-scout-scratchpad-search
{ "text": "error_tracking" }
Returns entries newest-first (capped at 100); text matches content and key
case-insensitively. Omit text to browse everything. Each entry's key carries a category
prefix that tells you what kind of learning it is:
| Prefix | Meaning |
| ------------- | ------------------------------------------------------------------ |
| pattern: | A baseline — how this team's data normally shapes |
| watch: | A live issue being tracked but still below the emit bar |
| noise: | A pattern the fleet has decided to ignore (dev-only, single-user…) |
| addressed: | Something the team fixed or moved on from |
| dedupe: | A gate on re-emitting a specific issue / fingerprint / finding |
| allowlist: | Vetted entities never to re-surface |
| not-in-use: | A product/surface this team doesn't use (close-out memo) |
| mcp-gap: | A tooling gap a scout noticed worth raising later |
This is the common vocabulary, not a closed set — scouts coin their own prefixes and <domain>
labels as needed (the live fleet uses watch: heavily, for example), so treat an unfamiliar
prefix as just another category. Entries cross-reference each other with [[key]] wikilinks. Keys
follow <prefix>:<domain>:<entity> (e.g. dedupe:error_tracking:019e8375-…).
When a user asks "why isn't my scout flagging X anymore?", search the scratchpad for noise:,
addressed:, dedupe:, and allowlist: entries — the fleet may have deliberately learned to
suppress it. The canonical prefix vocabulary and the four-state dedupe classifier the fleet
reasons in terms of are documented in the authoring-signals-scouts skill
(references/dedupe-and-memory.md).
"What has the fleet emitted lately / show me every finding my scouts produced." The run row
carries no emit flag and no finding count, the prose summary is heuristic, and the inbox
filter (below) is lossy because grouping merges scout findings into mixed-source clusters. The
authoritative per-finding record is the emitted signal itself, in the document_embeddings
table — queryable for any team via execute-sql (the general path). When a scout emits,
emit_signal writes a signal with source_product="signals_scout"; the scout's attribution
(skill_name, finding_id, severity, confidence) lands in metadata.extra, with weight
and source_id at the top level.
Fetch with execute-sql and format with scripts/emitted_signals.py — the
exact query lives in the script's header. One row per finding, filterable by any set of scouts:
# call --json execute-sql { "truncate": false, "query": "<the emitted-signals query>" } -> emitted.txt
python scripts/emitted_signals.py --signals emitted.txt --now <ISO> [--skill mcp-feedback,general]
A row here is ground truth that a finding persisted — it cleared every emit gate. The flip
side matters when explaining a gap: a scout can narrate "EMITTED ..." in its summary yet have
the emit silently dropped by a preflight gate (dry-run at the time, the org hasn't approved
AI processing, or the signals_scout source is disabled), or the emit failed. Those never reach
this table, so a claimed-but-absent finding is itself a diagnostic, not a script bug. The emit
contract behind each row (weight vs. confidence rubrics, severity, dedupe) is documented in the
authoring-signals-scouts skill (references/emit-contract.md); the run → finding link and its
limits are in references/scout-data-model.md.
Scout findings reach the user as inbox reports. Filter the inbox to the scout source:
inbox-reports-list
{ "source_product": "signals_scout", "limit": 20 }
This is the direct way to find scout-backed reports. Each finding is emitted with
source_product="signals_scout", that tag rides through grouping into the report's signal metadata,
and the inbox filter keeps any report whose contributing signals include signals_scout — so the
result is the set of reports the fleet has surfaced.
An empty result means the fleet hasn't emitted (yet), not that the filter is broken. Scouts hold
a high bar — most runs close out without emitting — so on a quiet or newly enrolled project zero
scout-backed reports is the normal, expected state. For the per-run view of what emitted, work from
the runs instead: signals-scout-runs-list?emitted=true lists every emitting run, and each run's
emitted_count / emitted_finding_ids tell you how many and which findings it produced (each
finding_id maps to a Signal with source_id = run:<run_id>:finding:<finding_id>). To browse the
inbox more broadly, use the inbox-exploration skill (statuses, suggested reviewers, drilling
into a report's underlying signals). The emit contract behind each finding — weight, confidence,
severity, the description prose — is documented in the authoring-signals-scouts skill
(references/emit-contract.md).
"Is my scout actually working / earning its cost?" There's no single metric — judge a scout over a
window of runs. Pull the runs (runs-list with a date_from), then reason across the dimensions
below. The full playbook, including how to read each signal and the common failure modes, is in
references/assessing-performance.md.
run_interval_minutes? Large gaps mean
the coordinator is skipping it (disabled, drained from the flag, or capped out on busy ticks) —
or it's dispatching but the runs aren't materializing. Tell the two apart with last_run_at: if
the config's last_run_at is also stale, the coordinator stopped planning it; if last_run_at is
fresh but the newest run row is hours old, it's the dispatch-vs-execution divergence above (workers
backed up / down, or runs stranded), which runs-list alone hides.status vs. error out? A run of errors is a
broken scout, not a quiet one.emitted_count per run (or split the window with runs-list?emitted=true / ?emitted=false).
Near-zero over a long window on a live surface can mean the discriminator is too strict (or the
surface really is quiet); near-100% usually means it's too noisy. Most healthy scouts emit rarely.emitted_finding_ids to tie runs to their Signal rows, and
cross-check against inbox-reports-list report states.pattern: / noise: / dedupe: entries over
time. A scout with an empty scratchpad after many runs isn't learning.The skill bundles four pure formatters under scripts/ for the most common asks.
They do no network I/O — they are the back half of an "agent fetches, script formats" split.
The pattern is always the same:
call --json (raw JSON, not the pretty text format)
and save it to a file. For the big ones (runs-list, tasks-runs-session-logs-retrieve) this
is mandatory anyway — they overflow inline and spill to a file you can point the script at.All four are stdlib-only Python 3.11+ and print plain text to stdout (or --out) — designed
to read well in a terminal, so save them as .txt.
scripts/render_run_report.py — drill into one runProduces the kind of detailed write-up you'd want when inspecting a single run: header (status, duration, posture), a narrated timeline that interleaves the agent's narration with each tool call and its real input, the end-of-run summary, and any scratchpad memory.
# fetch (note --json), saving each to a file:
# call --json signals-scout-runs-retrieve { "id": "<run_id>" } -> run.json
# call --json tasks-runs-session-logs-retrieve { "id": "<task_run_id>", "task_id": "<task_id>", "offset": 0 } -> log.json (FULL — no exclude_types)
# (optional) call --json signals-scout-scratchpad-search { ... } -> mem.json
# (optional) call --json signals-scout-config-list {} -> cfg.json
python scripts/render_run_report.py --run run.json --log log.json \
--scratchpad mem.json --config cfg.json --out report.txt
Modes (--mode, default detailed):
| Mode | Contains | --log needed? |
| ---------- | ------------------------------------------------------------------ | --------------- |
| summary | header + posture + close-out prose | no |
| detailed | + narrated timeline with tool inputs + tool tally + scratchpad | yes |
| full | + each tool call's (truncated) output inline | yes |
Other flags: --show-output (outputs in detailed mode), --input-width / --output-width
(truncation), --no-art (skip the hedgehog banner), --base-url (defaults to us.posthog.com).
scripts/fleet_survey.py — survey the whole fleetOne scannable table — scout, enabled, posture, cadence, last run, last outcome — with a "worth a look" section that flags never-run, stuck-in-dry-run, and last-run-failed scouts.
# call --json signals-scout-config-list {} -> cfg.json
# (optional) call --json signals-scout-runs-list { "limit": 30 } -> runs.json (small limit!)
python scripts/fleet_survey.py --config cfg.json --runs runs.json --now <current-ISO-time>
Pass --now (the current time, ISO-8601) to get relative "ago" columns; the emit/quiet column is
a heuristic on each run's summary prose — confirm against the summary before trusting it.
scripts/assess_health.py — health over a window of runsImplements the "assess health and performance" workflow above: a per-scout table (runs, success %, emit %, cadence gap vs interval, adherence, median duration, memory growth) plus a "worth a look" section flagging all-failed scouts, timeout-shaped failures, cadence stalls, staleness, and empty scratchpads.
# call --json signals-scout-runs-list { "limit": 100, "date_from": "<ISO>" } -> runs.json
# (optional) call --json signals-scout-config-list {} -> cfg.json
# (optional) call --json signals-scout-scratchpad-search {} -> mem.json
python scripts/assess_health.py --runs runs.json --config cfg.json \
--scratchpad mem.json --now <current-ISO-time> [--skill signals-scout-general]
--config is what lets it score cadence adherence (the expected interval) and staleness (the
authoritative last_run_at, which the windowed runs can miss when the 100-row cap truncates the
newest runs). Without --scratchpad the memory column shows n/a and no memory flags fire. The
emit % is the same summary-prose heuristic — cross-check signal-to-noise against
inbox-reports-list.
scripts/emitted_signals.py — every finding the fleet actually emittedImplements the "list what scouts have actually emitted" workflow: the authoritative per-finding
table (when, scout, severity, weight, confidence, finding_id, one-line hypothesis) plus a
per-scout rollup (emit count, severity mix, weight range, latest emit). Unlike assess_health's
emit % — a prose heuristic — this reads the emitted signals directly, so it's exact.
Its input is not a signals-scout-* tool; it's an execute-sql result over
document_embeddings (the general, any-team path). The full query lives in the script's header —
copy it verbatim. execute-sql returns a pipe-delimited text table (even under call --json it's
that text wrapped in a JSON string), so the script parses that text; the query deliberately selects
only pipe-safe scalar columns (the multi-line description is excluded, hypothesis is sanitized).
# call --json execute-sql { "truncate": false, "query": "<emitted-signals query from the header>" } -> emitted.txt
python scripts/emitted_signals.py --signals emitted.txt --now <current-ISO-time> \
[--skill mcp-feedback,general] [--severity P0,P1,P2] [--since <ISO>] [--sort weight] [--wide]
--skill takes a comma-separated set (the signals-scout- prefix is optional). --wide adds the
scout_run_id so you can chain straight into render_run_report.py for the run that emitted a
finding. Remember the coverage caveat: this lists signals that persisted — a finding a run
summary claims but that's absent here was gated (dry-run / AI processing not approved / source
disabled) or failed.
emit posture. "Running but in dry-run" is the single most common reason
a user thinks a scout is broken when it isn't.runs-list?emitted=true
(or read emitted_count per run) to find what emitted, without parsing the prose summary. The
source_product: "signals_scout" inbox filter lists the reports the fleet surfaced; an empty
result there means it hasn't emitted yet (scouts hold a high bar), not that the filter is broken.failed is usually a timeout, not a broken scout. Completed runs finish
in a couple of minutes. Most often the scout over-investigated and ran the full budget (the fleet
self-corrects by writing "tight-run recipe" scratchpad entries) — but some are false timeouts
where the scout actually finished in a few minutes and the run then hung on a dropped close-out.
The session log (above) tells them apart: real over-investigation shows tool calls right up to the
wall; a false timeout goes silent long before it. Don't assume over-investigation from duration
alone.summary, then offer task_url for the full transcript — don't dump raw
run rows at the user.last_run_at: null means a scout has never fired — check it's enabled and the project is
enrolled before digging further.signals-scout-project-profile-get
shows whether the surface it watches is even in use — a logs scout on a project with no logs has
nothing to do.authoring-signals-scouts skill — it covers signals-scout-config-update and the
skills-store edit path.tools
Focused Signals scout for PostHog projects with web traffic. Watches the acquisition and site-health layer the web analytics product reports on: per-channel session volume diverging from the site's own rhythm (an acquisition source silently collapsing or surging), attribution breakage (paid/campaign traffic reclassifying into Direct or Unknown when tagging breaks), landing pages that break (bounce-rate steps, 404 spikes, entry-path cliffs), and page-performance regressions (web vitals p75 steps). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.
tools
Focused Signals scout for PostHog projects using session replay. Watches two promises the replay product makes: that sessions are actually being recorded (capture integrity — recording volume vanishing while site traffic doesn't), and that the friction evidence inside recordings gets seen (rage-click / dead-click clusters concentrating on a page or element, error-after-interaction cohorts, recurring replay vision themes nobody aggregates). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.
tools
Focused Signals scout for PostHog setup health. Reads the project's active health issues — the deterministic findings of PostHog's own health checks (no live events, outdated SDKs, missing reverse proxy, absent web vitals, ingestion warnings, failing data-warehouse models, and more) — and decides which are genuinely worth surfacing. Unlike a one-signal-per-issue push, it bundles kind-clusters into a single finding, weights by real blast radius (cross-referencing actual event volume and reach), and prioritizes issues an agent can resolve via the MCP. Emits only above the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.
tools
Focused Signals scout for PostHog projects using feature flags. Watches the flag roster and the `$feature_flag_called` evaluation stream for contradictions between a flag's configured state and its real traffic: evaluation cliffs on healthy flags, ghost flags (code calling keys that no longer exist), response-distribution shifts with no corresponding flag edit, and flag debt (stale, fully-rolled-out, or dead flags still burning evaluations). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.