skills/diagnosing-ci-and-merge-bottlenecks/SKILL.md
Diagnoses CI and pull-request pipeline health for a GitHub repo using the engineering analytics MCP tools — pull-requests (PR list with CI status), workflow-health (per-workflow CI trends), and pr-lifecycle (a single PR's timeline). Use when asked whether CI is getting faster or slower, which GitHub Actions workflow is the slow or flaky long-pole, how long PRs take from open to merge, how an author's merge time compares to the cohort, which open PRs have failing or pending CI, or where a specific pull request is stuck. Triggers on "engineering analytics", "is CI getting slower", "slow workflow", "flaky CI", "time to merge", "cycle time", "PR throughput", "failing checks", "where is PR <n> stuck", "CI long pole", "what's holding up this PR".
npx skillsauth add posthog/ai-plugin diagnosing-ci-and-merge-bottlenecksInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Engineering analytics treats a pull request like product analytics treats a user: a PR moves through a pipeline
(opened → CI → review → merged → deployed) and the job is to find where it slows down. The surface is three
named MCP tools — you call them, you don't write SQL. Dogfooded on PostHog/posthog; the same tools serve
autonomous agents (e.g. PostHog Code) reasoning about their own PRs.
pull-requests — the PR workhorse. Open PRs plus anything merged or closed since date_from (default
-30d), newest first. Each row carries author (nested object: handle, display_name, is_bot), repo
(nested: owner, name), state, is_draft, labels, open_to_merge_seconds, and a ci rollup
(runs / passing / failing / pending) from the head-SHA join. Answers most PR-level questions:
which PRs have failing or pending CI, which are stuck open longest, per-author or per-repo triage, and
time-to-merge stats (aggregate open_to_merge_seconds over the returned merged rows yourself — median and p95,
never a mean).workflow-health — per-workflow CI health over a window (date_from / date_to, default last 30 days):
run_count, success_rate, p50_seconds, p95_seconds, last_failure_at. Answers "is CI getting faster or
slower" and "which workflow is the slow or flaky long pole". There is no built-in trend — call it over two
adjacent windows and compare. success_rate / p50_seconds / p95_seconds cover completed runs only and are
null when a window has no completed runs — guard for null before comparing two windows (a workflow can have
runs in one and none in the other).pr-lifecycle — a single PR's timeline: a header plus ordered events — opened, then a CI started/finished
pair per workflow run (many on a multi-workflow repo, interleaved by time), then merged/closed. Answers
"where is PR N stuck". metric_quality is partial.There is no aggregate time-to-merge tool and no "counts" tool — derive those from pull-requests (the stuck/failing
counts, the merge-time percentiles).
These are structural limits of today's snapshot data — state them, don't paper over them.
open_to_merge_seconds is coarse. It fuses draft time and ready-for-review time into one figure. Report
it as "open to merge", never "cycle time" or "review time". Flag it when long-lived drafts inflate a number.workflow_run webhook ships). Treat a pending count as unsettled, not as a settled
failure; lead with status, not a verdict.ci rollup reflects only the latest commit's runs. There
is no other link between a PR and its checks.pr-lifecycle is partial for the same reason.pull-requests output, excluded by convention. Filter out author.is_bot
(nested under author, not a row-level field) and is_draft for throughput / merge-time questions; keep them in
for bot-impact questions.pull-requests returns a capped page. At most limit rows (newest first); truncated is true when more
match, and there is no repo or limit filter to narrow the call. When truncated is true, any percentile or
count you derive covers only the newest page — not the whole window — so say so and shrink date_from until the
real set fits under the cap.| The question | Tool | How |
| ------------------------------------------------------ | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Is CI getting slower? Which workflow is the long pole? | workflow-health | Call over two adjacent windows (e.g. date_from=-14d, then date_from=-28d date_to=-14d); compare p50_seconds and p95_seconds per workflow. Lead with the median but always check p95 separately — they move independently. |
| Which open PRs have failing or pending CI? | pull-requests | Keep rows where ci.failing > 0 or ci.pending > 0. pending means unsettled (or stale) — not a settled failure. |
| Which PRs are stuck open longest? | pull-requests | Keep state = open, not is_draft, not author.is_bot; sort by created_at ascending (oldest first). |
| How long are PRs taking to merge? Per author? | pull-requests | Over merged rows (merged_at set, not bot, not draft), aggregate open_to_merge_seconds — median and p95. Group by author.handle for cohort context, not a ranking (per-developer surveillance is an explicit non-goal). Trend it by calling with two date_from windows. |
| Where is PR N stuck? | pr-lifecycle | Walk the sorted events: opened → first CI started, the CI span (first start → last finish; one pair per workflow), last CI finished → merged. The largest gap is the bottleneck. A long open→merge with quick CI points at review/idle time the partial data can't itemize yet — say so. |
Mirror how a human investigates: aggregate signal → confirm → concrete PR.
workflow-health (find the slow/flaky long-pole workflow)
→ pull-requests (confirm it's dragging merge time; list the affected PRs)
→ pr-lifecycle (open a representative stuck PR and show the gap)
"CI median rose because e2e-playwright p95 doubled; that workflow is the long pole on PR #1234, which sat 47m in
CI before merging."
open_to_merge_seconds cycle time or review time — it's coarse open-to-merge.pending > 0 — it may be unsettled or stale.tools
Focused Signals scout for PostHog projects with web traffic. Watches the acquisition and site-health layer the web analytics product reports on: per-channel session volume diverging from the site's own rhythm (an acquisition source silently collapsing or surging), attribution breakage (paid/campaign traffic reclassifying into Direct or Unknown when tagging breaks), landing pages that break (bounce-rate steps, 404 spikes, entry-path cliffs), and page-performance regressions (web vitals p75 steps). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.
tools
Focused Signals scout for PostHog projects using session replay. Watches two promises the replay product makes: that sessions are actually being recorded (capture integrity — recording volume vanishing while site traffic doesn't), and that the friction evidence inside recordings gets seen (rage-click / dead-click clusters concentrating on a page or element, error-after-interaction cohorts, recurring replay vision themes nobody aggregates). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.
tools
Focused Signals scout for PostHog setup health. Reads the project's active health issues — the deterministic findings of PostHog's own health checks (no live events, outdated SDKs, missing reverse proxy, absent web vitals, ingestion warnings, failing data-warehouse models, and more) — and decides which are genuinely worth surfacing. Unlike a one-signal-per-issue push, it bundles kind-clusters into a single finding, weights by real blast radius (cross-referencing actual event volume and reach), and prioritizes issues an agent can resolve via the MCP. Emits only above the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.
tools
Focused Signals scout for PostHog projects using feature flags. Watches the flag roster and the `$feature_flag_called` evaluation stream for contradictions between a flag's configured state and its real traffic: evaluation cliffs on healthy flags, ghost flags (code calling keys that no longer exist), response-distribution shifts with no corresponding flag edit, and flag debt (stale, fully-rolled-out, or dead flags still burning evaluations). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.