agents/skills/ci-investigate/SKILL.md
Investigate flaky CI failures across multiple workflow runs to identify patterns, categorize root causes, and propose fixes. Use when asked to investigate CI failures, find flaky tests, diagnose test flakiness, or understand why CI is failing repeatedly.
npx skillsauth add drn/dots ci-investigateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Investigate flaky CI failures across multiple workflow runs to identify patterns, categorize root causes, and propose targeted fixes.
$ARGUMENTS - Optional: job name filter, number of runs to check, or branch name. Examples: "test-core-rspec", "--runs 10", "master"If no arguments are provided, investigate the most recent failing workflow on the current branch.
git branch --show-currentgit remote get-url origin 2>/dev/null | head -1find . -maxdepth 2 \( -name ".circleci" -o -name ".github" \) -type d 2>/dev/null | head -5head -30 .circleci/config.yml 2>/dev/null | head -30Investigate CI failures by fetching multiple workflow runs, extracting test results, and categorizing failure patterns.
Parse $ARGUMENTS for:
Derive the project slug from the git remote URL. For GitHub repos, format is "gh/org/repo".
Report the scope to the user before proceeding:
Detect the CI provider from the context above and use the appropriate tools.
CircleCI (if .circleci/ exists): Use ToolSearch to find available CircleCI MCP tools (search "circleci"). Then:
GitHub Actions (if .github/workflows/ exists): Use the gh CLI:
gh run list --branch {branch} --limit {N} to get recent runsgh run view {run_id} to get job detailsgh run view {run_id} --log-failed to get failure outputCollect up to the target number of failed runs. If a workflow has no failures, skip it but note it as a passing run (useful for calculating flake rate).
Track:
For each failed job found in Step 2:
CircleCI:
GitHub Actions:
gh run view {run_id} --log-failed to get failure outputFor each failure, extract:
Use parallel Task tool calls to fetch test results for multiple jobs simultaneously when possible.
Group failures by test file + test name. For each group, classify the root cause:
| Category | Signals | Common Fix | |----------|---------|------------| | Deterministic | Fails every run, same error | Fix the test or code -- this is a real bug | | Timing flake | Intermittent, error involves time comparison, values differ by milliseconds | Freeze time in tests, use tolerance matchers | | Parallel collision | Hard-coded IDs, PK violations, "Duplicate entry" | Use sequences/auto-increment, avoid hard-coded IDs | | Test isolation | Order-dependent failures, shared mutable state between tests | Reset state between tests, avoid global side effects | | Infrastructure | 0 test failures + exit code 1, OOM killed, container timeout | Retry or investigate resource limits | | External dependency | Timeout connecting to external service, API errors | Add retry logic or stub external calls in tests |
If a failure does not clearly fit one category, mark it as Unclassified and include the full error for manual review.
Present the findings in this format:
## CI Investigation Report
**Scope:** {project} / {branch} / {job filter or "all jobs"}
**Runs analyzed:** {N} ({pass_count} passed, {fail_count} failed)
**Overall flake rate:** {fail_count/N * 100}%
### Failure Groups (by frequency)
#### 1. {test_file}:{test_name} -- {category}
- **Frequency:** {X}/{N} runs ({percentage}%)
- **Error:** {1-2 line error summary}
- **Affected nodes:** {node numbers if relevant}
- **Root cause:** {explanation}
- **Proposed fix:**
{specific code change or strategy}
#### 2. ...
### Infrastructure Failures
{List any jobs that failed with 0 test failures}
### Cross-Repo Issues
{Flag any fixes needed in shared gems (Nucleus, etc.) or CI configuration}
### Recommended Priority
1. {highest frequency flake} -- affects X% of runs
2. ...
After presenting the report, ask the user if they want to:
development
Build a self-contained, single-file HTML presentation deck from talking points or a source doc, using a terminal/TUI-styled template with keyboard, tap, and swipe navigation. Use when the user wants to create slides, build a presentation or deck, turn talking points or a doc into a talk, make an HTML slideshow, or produce a presentation as a shareable artifact (instead of Google Slides).
development
Render a Markdown file to GitHub-flavored HTML and open a styled local preview (light + dark) in the browser. Use when the user wants to preview markdown, see how a README renders on GitHub, check that relative screenshots or images display correctly, or get a GitHub-like local preview without installing grip or glow.
tools
Mark the current Argus task as complete. Use when the work for the current worktree is done and the user wants the task to transition to the "complete" status.
development
Launch a dynamic Workflow where the top-tier session model (Fable) handles planning and orchestration while implementation subagents run on Sonnet for routine tasks and Opus for complex ones. Use when the user wants to orchestrate a build, a dynamic workflow, a model-tiered build, fable planning with sonnet and opus implementation, or tiered agents.