Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

posthog/diagnosing-ci-and-merge-bottlenecks

Name: diagnosing-ci-and-merge-bottlenecks
Author: posthog

skills/diagnosing-ci-and-merge-bottlenecks/SKILL.md

npx skillsauth add posthog/ai-plugin diagnosing-ci-and-merge-bottlenecks

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Diagnosing CI and merge bottlenecks

Engineering analytics treats a pull request like product analytics treats a user: a PR moves through a pipeline (opened → CI → review → merged → deployed) and the job is to find where it slows down. The surface is three named MCP tools — you call them, you don't write SQL. Dogfooded on PostHog/posthog; the same tools serve autonomous agents (e.g. PostHog Code) reasoning about their own PRs.

The tools

pull-requests — the PR workhorse. Open PRs plus anything merged or closed since date_from (default -30d), newest first. Each row carries author (nested object: handle, display_name, is_bot), repo (nested: owner, name), state, is_draft, labels, open_to_merge_seconds, and a ci rollup (runs / passing / failing / pending) from the head-SHA join. Answers most PR-level questions: which PRs have failing or pending CI, which are stuck open longest, per-author or per-repo triage, and time-to-merge stats (aggregate open_to_merge_seconds over the returned merged rows yourself — median and p95, never a mean).
workflow-health — per-workflow CI health over a window (date_from / date_to, default last 30 days): run_count, success_rate, p50_seconds, p95_seconds, last_failure_at. Answers "is CI getting faster or slower" and "which workflow is the slow or flaky long pole". There is no built-in trend — call it over two adjacent windows and compare. success_rate / p50_seconds / p95_seconds cover completed runs only and are null when a window has no completed runs — guard for null before comparing two windows (a workflow can have runs in one and none in the other).
pr-lifecycle — a single PR's timeline: a header plus ordered events — opened, then a CI started/finished pair per workflow run (many on a multi-workflow repo, interleaved by time), then merged/closed. Answers "where is PR N stuck". metric_quality is partial.

There is no aggregate time-to-merge tool and no "counts" tool — derive those from pull-requests (the stuck/failing counts, the merge-time percentiles).

Caveats you must carry into every answer

These are structural limits of today's snapshot data — state them, don't paper over them.

open_to_merge_seconds is coarse. It fuses draft time and ready-for-review time into one figure. Report it as "open to merge", never "cycle time" or "review time". Flag it when long-lived drafts inflate a number.
CI status can be stale. The CI source syncs on a watermark and does not refresh a run that completes after newer runs land (until the workflow_run webhook ships). Treat a pending count as unsettled, not as a settled failure; lead with status, not a verdict.
CI for a PR is the head-SHA join, nothing else. The ci rollup reflects only the latest commit's runs. There is no other link between a PR and its checks.
No reviews, approvals, per-check/job, or deploys yet. Don't infer review behaviour or DORA metrics from their absence; that data hasn't landed. pr-lifecycle is partial for the same reason.
Bots and drafts are present in pull-requests output, excluded by convention. Filter out author.is_bot (nested under author, not a row-level field) and is_draft for throughput / merge-time questions; keep them in for bot-impact questions.
pull-requests returns a capped page. At most limit rows (newest first); truncated is true when more match, and there is no repo or limit filter to narrow the call. When truncated is true, any percentile or count you derive covers only the newest page — not the whole window — so say so and shrink date_from until the real set fits under the cap.

Choosing a tool

| The question | Tool | How | | ------------------------------------------------------ | ----------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Is CI getting slower? Which workflow is the long pole? | workflow-health | Call over two adjacent windows (e.g. date_from=-14d, then date_from=-28d date_to=-14d); compare p50_seconds and p95_seconds per workflow. Lead with the median but always check p95 separately — they move independently. | | Which open PRs have failing or pending CI? | pull-requests | Keep rows where ci.failing > 0 or ci.pending > 0. pending means unsettled (or stale) — not a settled failure. | | Which PRs are stuck open longest? | pull-requests | Keep state = open, not is_draft, not author.is_bot; sort by created_at ascending (oldest first). | | How long are PRs taking to merge? Per author? | pull-requests | Over merged rows (merged_at set, not bot, not draft), aggregate open_to_merge_seconds — median and p95. Group by author.handle for cohort context, not a ranking (per-developer surveillance is an explicit non-goal). Trend it by calling with two date_from windows. | | Where is PR N stuck? | pr-lifecycle | Walk the sorted events: opened → first CI started, the CI span (first start → last finish; one pair per workflow), last CI finished → merged. The largest gap is the bottleneck. A long open→merge with quick CI points at review/idle time the partial data can't itemize yet — say so. |

The high-value chain

Mirror how a human investigates: aggregate signal → confirm → concrete PR.

workflow-health  (find the slow/flaky long-pole workflow)
   → pull-requests  (confirm it's dragging merge time; list the affected PRs)
      → pr-lifecycle  (open a representative stuck PR and show the gap)

"CI median rose because e2e-playwright p95 doubled; that workflow is the long pole on PR #1234, which sat 47m in CI before merging."

Output expectations

Lead with the verdict in one line, then the supporting numbers.
Carry the coarse / partial / staleness caveat whenever the distinction matters.
For multi-window or multi-workflow comparisons, a short table beats prose. Report median and p95 side by side — never collapse them into one "average".

What NOT to do

Don't call open_to_merge_seconds cycle time or review time — it's coarse open-to-merge.
Don't report a CI count as a settled failure when pending > 0 — it may be unsettled or stale.
Don't infer reviews, approvals, per-check counts, or deploys — that data isn't ingested yet.
Don't turn per-author buckets into a leaderboard — they're for finding stuck work, not ranking people.
Don't reach for these tools to fetch raw PR contents or diffs — they surface pipeline signal, not the PR thread.

posthog/diagnosing-ci-and-merge-bottlenecks

skills/diagnosing-ci-and-merge-bottlenecks/SKILL.md

Diagnoses CI and pull-request pipeline health for a GitHub repo using the engineering analytics MCP tools — pull-requests (PR list with CI status), workflow-health (per-workflow CI trends), and pr-lifecycle (a single PR's timeline). Use when asked whether CI is getting faster or slower, which GitHub Actions workflow is the slow or flaky long-pole, how long PRs take from open to merge, how an author's merge time compares to the cohort, which open PRs have failing or pending CI, or where a specific pull request is stuck. Triggers on "engineering analytics", "is CI getting slower", "slow workflow", "flaky CI", "time to merge", "cycle time", "PR throughput", "failing checks", "where is PR <n> stuck", "CI long pole", "what's holding up this PR".

51 stars

tools

Updated Jun 11, 2026

$ install --global

skillsauth

npx skillsauth add posthog/ai-plugin diagnosing-ci-and-merge-bottlenecks

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 11, 2026, 6:42 AM410.5s1 file scanned

SKILL.md

name:: diagnosing-ci-and-merge-bottlenecks
description:: >

Diagnosing CI and merge bottlenecks

The tools

pull-requests — the PR workhorse. Open PRs plus anything merged or closed since date_from (default -30d), newest first. Each row carries author (nested object: handle, display_name, is_bot), repo (nested: owner, name), state, is_draft, labels, open_to_merge_seconds, and a ci rollup (runs / passing / failing / pending) from the head-SHA join. Answers most PR-level questions: which PRs have failing or pending CI, which are stuck open longest, per-author or per-repo triage, and time-to-merge stats (aggregate open_to_merge_seconds over the returned merged rows yourself — median and p95, never a mean).
workflow-health — per-workflow CI health over a window (date_from / date_to, default last 30 days): run_count, success_rate, p50_seconds, p95_seconds, last_failure_at. Answers "is CI getting faster or slower" and "which workflow is the slow or flaky long pole". There is no built-in trend — call it over two adjacent windows and compare. success_rate / p50_seconds / p95_seconds cover completed runs only and are null when a window has no completed runs — guard for null before comparing two windows (a workflow can have runs in one and none in the other).
pr-lifecycle — a single PR's timeline: a header plus ordered events — opened, then a CI started/finished pair per workflow run (many on a multi-workflow repo, interleaved by time), then merged/closed. Answers "where is PR N stuck". metric_quality is partial.

There is no aggregate time-to-merge tool and no "counts" tool — derive those from pull-requests (the stuck/failing counts, the merge-time percentiles).

Caveats you must carry into every answer

These are structural limits of today's snapshot data — state them, don't paper over them.

open_to_merge_seconds is coarse. It fuses draft time and ready-for-review time into one figure. Report it as "open to merge", never "cycle time" or "review time". Flag it when long-lived drafts inflate a number.
CI status can be stale. The CI source syncs on a watermark and does not refresh a run that completes after newer runs land (until the workflow_run webhook ships). Treat a pending count as unsettled, not as a settled failure; lead with status, not a verdict.
CI for a PR is the head-SHA join, nothing else. The ci rollup reflects only the latest commit's runs. There is no other link between a PR and its checks.
No reviews, approvals, per-check/job, or deploys yet. Don't infer review behaviour or DORA metrics from their absence; that data hasn't landed. pr-lifecycle is partial for the same reason.
Bots and drafts are present in pull-requests output, excluded by convention. Filter out author.is_bot (nested under author, not a row-level field) and is_draft for throughput / merge-time questions; keep them in for bot-impact questions.
pull-requests returns a capped page. At most limit rows (newest first); truncated is true when more match, and there is no repo or limit filter to narrow the call. When truncated is true, any percentile or count you derive covers only the newest page — not the whole window — so say so and shrink date_from until the real set fits under the cap.

Choosing a tool

The high-value chain

Mirror how a human investigates: aggregate signal → confirm → concrete PR.

workflow-health  (find the slow/flaky long-pole workflow)
   → pull-requests  (confirm it's dragging merge time; list the affected PRs)
      → pr-lifecycle  (open a representative stuck PR and show the gap)

"CI median rose because e2e-playwright p95 doubled; that workflow is the long pole on PR #1234, which sat 47m in CI before merging."

Output expectations

Lead with the verdict in one line, then the supporting numbers.
Carry the coarse / partial / staleness caveat whenever the distinction matters.
For multi-window or multi-workflow comparisons, a short table beats prose. Report median and p95 side by side — never collapse them into one "average".

What NOT to do

Don't call open_to_merge_seconds cycle time or review time — it's coarse open-to-merge.
Don't report a CI count as a settled failure when pending > 0 — it may be unsettled or stale.
Don't infer reviews, approvals, per-check counts, or deploys — that data isn't ingested yet.
Don't turn per-author buckets into a leaderboard — they're for finding stuck work, not ranking people.
Don't reach for these tools to fetch raw PR contents or diffs — they surface pipeline signal, not the PR thread.

Related Skills

posthog/signals-scout-web-analytics

tools

VerifiedTrustedCommunity

Focused Signals scout for PostHog projects with web traffic. Watches the acquisition and site-health layer the web analytics product reports on: per-channel session volume diverging from the site's own rhythm (an acquisition source silently collapsing or surging), attribution breakage (paid/campaign traffic reclassifying into Direct or Unknown when tagging breaks), landing pages that break (bounce-rate steps, 404 spikes, entry-path cliffs), and page-performance regressions (web vitals p75 steps). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.

51SKILL.mdUpdated Jun 11, 2026

posthog/signals-scout-web-analytics

posthog/signals-scout-session-replay

tools

VerifiedTrustedCommunity

Focused Signals scout for PostHog projects using session replay. Watches two promises the replay product makes: that sessions are actually being recorded (capture integrity — recording volume vanishing while site traffic doesn't), and that the friction evidence inside recordings gets seen (rage-click / dead-click clusters concentrating on a page or element, error-after-interaction cohorts, recurring replay vision themes nobody aggregates). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet.

51SKILL.mdUpdated Jun 11, 2026

posthog/signals-scout-session-replay

posthog/signals-scout-health-checks

tools

VerifiedTrustedCommunity

Focused Signals scout for PostHog setup health. Reads the project's active health issues — the deterministic findings of PostHog's own health checks (no live events, outdated SDKs, missing reverse proxy, absent web vitals, ingestion warnings, failing data-warehouse models, and more) — and decides which are genuinely worth surfacing. Unlike a one-signal-per-issue push, it bundles kind-clusters into a single finding, weights by real blast radius (cross-referencing actual event volume and reach), and prioritizes issues an agent can resolve via the MCP. Emits only above the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.

51SKILL.mdUpdated Jun 11, 2026

posthog/signals-scout-health-checks

posthog/signals-scout-feature-flags

tools

VerifiedTrustedCommunity

Focused Signals scout for PostHog projects using feature flags. Watches the flag roster and the `$feature_flag_called` evaluation stream for contradictions between a flag's configured state and its real traffic: evaluation cliffs on healthy flags, ghost flags (code calling keys that no longer exist), response-distribution shifts with no corresponding flag edit, and flag debt (stale, fully-rolled-out, or dead flags still burning evaluations). Emits findings only when they clear the confidence bar; otherwise writes durable memory and closes out empty. Self-contained peer in the signals-scout-* fleet — no dependencies on other skills.

51SKILL.mdUpdated Jun 11, 2026

posthog/signals-scout-feature-flags

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/posthog/ai-plugin.git

# Copy into Claude Code skills folder (global)
cp -r ai-plugin/skills/diagnosing-ci-and-merge-bottlenecks ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

posthog/ai-plugin

51 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT