skills/ci-monitoring/SKILL.md
Use after creating PR - monitor CI pipeline, resolve failures cyclically until green or issue is identified as unresolvable
npx skillsauth add troykelly/codex-skills ci-monitoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Monitor CI pipeline and resolve failures until green.
CRITICAL: CI is validation, not discovery.
If CI finds a bug you didn't find locally, your local testing was insufficient.
Before blaming CI, ask yourself:
- Did you run all tests locally?
- Did you test against local services (postgres, redis)?
- Did you run the same checks CI runs?
- Did you run integration tests, not just unit tests with mocks?
CI should only fail for: environment differences, flaky tests, or infrastructure issues—never for bugs you could have caught locally.
Core principle: CI failures are blockers. But they should never be surprises.
Announce at start: "I'm monitoring CI and will resolve any failures."
PR Created
│
▼
┌─────────────┐
│ Wait for CI │
└──────┬──────┘
│
▼
┌─────────────┐
│ CI Status? │
└──────┬──────┘
│
┌───┴───┐
│ │
Green Red/Failed
│ │
▼ ▼
┌─────────┐ ┌─────────────┐
│ MERGE │ │ Diagnose │
│ THE PR │ │ failure │
└────┬────┘ └──────┬──────┘
│ │
▼ ▼
┌─────────┐ ┌─────────────┐
│ Continue│ │ Fixable? │
│ to next │ └──────┬──────┘
│ issue │ │
└─────────┘ ┌────┴────┐
│ │
Yes No
│ │
▼ ▼
┌─────────┐ ┌─────────────┐
│ Fix and │ │ Document as │
│ push │ │ unresolvable│
└────┬────┘ └─────────────┘
│
└────► Back to "Wait for CI"
When CI passes, you MUST merge the PR and continue working.
Do NOT:
DO:
gh pr merge [PR_NUMBER] --squash --delete-branch# When CI passes
gh pr merge [PR_NUMBER] --squash --delete-branch
# Update linked issue status
gh issue edit [ISSUE_NUMBER] --remove-label "status:in-review" --add-label "status:done"
# Continue to next issue (do not stop)
The only exception: PRs with do-not-merge label require explicit user action.
# Check all CI checks
gh pr checks [PR_NUMBER]
# Watch CI in real-time
gh pr checks [PR_NUMBER] --watch
# Get detailed status
gh pr view [PR_NUMBER] --json statusCheckRollup
All checks were successful
0 failing, 0 pending, 5 passing
CHECKS
✓ build 1m23s
✓ lint 45s
✓ test 3m12s
✓ typecheck 1m05s
✓ security-scan 2m30s
# Get failed check details
gh pr checks [PR_NUMBER]
# View workflow run logs
gh run view [RUN_ID] --log-failed
Common failure types:
| Type | Symptoms | Cause |
|------|----------|-------|
| Test failure | FAIL in test output | Code bug or test bug |
| Build failure | Compilation errors | Type errors, syntax errors |
| Lint failure | Style violations | Formatting, conventions |
| Typecheck failure | Type errors | Missing types, wrong types |
| Timeout | Job exceeded time limit | Performance issue or stuck test |
| Flaky test | Passes locally, fails CI | Race condition, environment difference |
# Reproduce locally
pnpm test
# Run specific failing test
pnpm test --grep "test name"
# Fix the code or test
# Commit and push
# Reproduce locally
pnpm build
# Fix compilation errors
# Commit and push
# Check lint errors
pnpm lint
# Auto-fix what's possible
pnpm lint:fix
# Manually fix remaining
# Commit and push
# Check type errors
pnpm typecheck
# Fix type issues
# Commit and push
# Commit fix
git add .
git commit -m "fix(ci): Resolve test failure in user validation"
# Push
git push
# Wait for CI again
gh pr checks [PR_NUMBER] --watch
Loop through diagnose → fix → push → wait until all checks pass.
Test passes locally
Test fails in CI
Test passes on retry in CI
// Common flaky pattern: timing dependency
// BAD
await saveData();
await delay(100); // Hoping 100ms is enough
const result = await loadData();
// GOOD: Wait for condition
await saveData();
await waitFor(() => dataExists());
const result = await loadData();
Sometimes failures can't be fixed in the current PR:
| Case | Example | |------|---------| | CI infrastructure issue | Service down, rate limited | | Pre-existing flaky test | Not introduced by this PR | | Upstream dependency issue | External API changed | | Requires manual intervention | Needs secrets, permissions |
gh pr comment [PR_NUMBER] --body "## CI Issue
The \`security-scan\` check is failing due to a known issue with the scanner service (see #999).
This is not related to changes in this PR. The scan passes when run locally.
Requesting bypass approval from @maintainer."
gh issue create \
--title "CI: Security scanner service timeout" \
--body "The security scanner is timing out in CI..."
Some teams allow merging with known infrastructure failures.
If the failure is from your code, it must be fixed.
CI is the last resort, not the first check.
Before pushing, run EVERYTHING CI will run:
# Run the same checks CI will run
pnpm lint
pnpm typecheck
pnpm test # Unit tests
pnpm test:integration # Integration tests against real services
pnpm build
# If you have database changes
docker-compose up -d postgres
pnpm migrate
If your project has docker-compose services:
docker-compose up -dSkill: local-service-testing
Don't push 10 commits at once. Push smaller changes:
# Small fix, push, verify
git push
# Wait for CI
gh pr checks --watch
# Then next change
Don't "push and forget":
# Watch CI after each push
gh pr checks [PR_NUMBER] --watch
For each CI run:
When CI is green:
gh pr merge --squash --delete-branch)For unresolvable issues:
This skill is called by:
issue-driven-development - Step 13autonomous-orchestration - Main loop and bootstrapThis skill follows:
pr-creation - PR existsThis skill completes:
This skill may trigger:
error-recovery - If CI reveals deeper issuesdata-ai
Defines behavior protocol for spawned worker agents. Injected into worker prompts. Covers startup, progress reporting, exit conditions, and handover preparation.
development
Defines context handover format when workers hit turn limit. Posts structured handover to GitHub issue comments enabling replacement workers to continue seamlessly.
data-ai
Use to spawn isolated worker processes for autonomous issue work. Creates git worktrees, constructs worker prompts, and handles worker lifecycle.
tools
Entry point for ALL work requests - triages scope from trivial to massive, asks clarifying questions, and routes to appropriate planning skills. Use this when receiving any new work request.