plugin/skills/self-healing-ci/SKILL.md
CI-only self-healing workflow using gh-aw (GitHub Agentic Workflows) for active runtime recovery on pull requests and scheduled runs. When a CI check fails (test, build, lint, deploy, scan), this skill diagnoses the failure from CI logs, proposes a verified patch as a PR comment or follow-up commit, and commits a HEAL entry to `.learnings/HEALS.md`. Verify-before-persist discipline preserved: a HEAL is only `verified` if a re-run check passes in the same workflow; otherwise it ships as `pending-verify` for human follow-up. Recurrent heal patterns across PRs accumulate `Recurrence-Count` and append a `Handoff` block at ≥3 to flag promotion via self-improvement-ci. Use this skill when: you want headless heal-loop execution in CI/scheduled pipelines, you want recurring failure patterns captured automatically, or you want PRs that surface non-obvious environmental / tooling fixes without human triage. For interactive/local sessions, use `self-healing` instead.
npx skillsauth add pskoett/pskoett-ai-skills self-healing-ciInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
CI-only variant of self-healing. Runs the diagnose → patch → verify → file loop headlessly against pull-request and scheduled workflow events.
gh skill install pskoett/pskoett-skills self-healing-ci
Fallback using the Agent Skills CLI:
npx skills add pskoett/pskoett-skills/skills/self-healing-ci
Run self-healing in CI without interactive chat loops:
HEAL- entry to .learnings/HEALS.md with verification proof (or pending-verify if the workflow can't re-run the check)Pattern-Key before filing new ones — deduplicate recurrencesHandoff block at Recurrence-Count >= 3 for promotion via self-improvement-ciUse self-healing for interactive/local sessions.
CI agents do not have peak task context from the original implementation session. The agent is reading CI logs and code, not riding peak context after a focused implementation. Implications:
pending-verify and surface to the PR authorverified — re-run the failing check in the same workflow rungh auth status)gh-aw installed for authoring/validation:gh extension install github/gh-aw
.learnings/HEALS.md committed to the repo (or created on first run; see references/workflow-example.md for the bootstrap pattern)The CI skill must:
.learnings/HEALS.md — nothing else from the PR author's machineverified requires this; pending-verify is honest if it cannotHEAL- entry only on a successful re-run — abandoned heals are still filed, but in a separate commit clearly labeledself_healing_ci:
source:
pr_number: 123
commit_sha: "abc123def"
failed_check: "test (node 20)"
workflow_run_id: 4567891234
heal:
heal_id: "HEAL-20260524-001"
status: "verified" # verified | pending-verify | abandoned
trigger: "tool-failure" # free-form
active_context: "ci" # optional
area: "tests" # free-form
pattern_key: "env.lockfile_mismatch"
diagnosis: "Project uses pnpm; CI workflow ran `npm ci`."
fix:
summary: "Switch the CI install step from `npm ci` to `pnpm install --frozen-lockfile`."
diff_path: ".learnings/heals/HEAL-20260524-001/patch.diff" # only if files generated
verification:
command: "pnpm install --frozen-lockfile"
exit_code: 0
output_excerpt: "Lockfile is up to date, resolution step is skipped"
recurrence_count: 1
promotion_ready: false # true at recurrence_count >= 3
summary:
heals_filed: 1
verified: 1
pending_verify: 0
abandoned: 0
promotion_candidates: 0
In CI the verify step is operationalized as re-running the failed check inside the same workflow run after applying the proposed patch:
| Original failure | Verify step in CI |
|------------------|-------------------|
| pnpm test failed | Re-run pnpm test after the patch |
| Build (tsc, cargo build) failed | Re-run the build step |
| Lint (eslint, ruff) failed | Re-run the lint step |
| Deploy preview failed | Re-run the deploy step (if the workflow allows) |
| Snapshot diff | Re-run with deterministic stubs if applicable |
If the re-run isn't feasible (the check requires secrets only available in production workflows; the failure is transient; the patch needs human review before commit), the HEAL ships as pending-verify with explicit notes on what would prove it.
Never fake verified. Faking is the exact failure mode this skill exists to prevent — and in CI, the consequences propagate further than in interactive sessions because future PRs may apply the unverified "fix" automatically.
.learnings/HEALS.md by Pattern-Key before filing new healsRecurrence-Count, update Last-Seen, append the new occurrence to See AlsoRecurrence-Count >= 3Handoff block to the existing HEAL with a Promotion Target (CLAUDE.md / AGENTS.md / .github/copilot-instructions.md / new-skill) and a one-line Distilled Ruleself-improvement-ci consumes the Handoff blocks and proposes the promotion as a PR| Trigger | Use case |
|---------|----------|
| workflow_run (completed, conclusion: failure) | Most common — react to other workflows failing |
| pull_request (with if: guard on check status) | Run on every PR but skip if all checks passed |
| schedule (nightly) | Look for stale flakes, surface patterns the per-PR runs missed |
| workflow_dispatch | Manual replay against a specific PR or commit |
Authoring patterns and example .github/workflows/*.lock.yml files live in references/workflow-example.md. Keep example workflows out of .github/workflows until you've explicitly decided to enable CI automation.
The interactive skill's anti-patterns all apply. CI-specific ones to watch:
if: github.actor != 'github-actions[bot]' to prevent infinite loops.Pattern-Key collisions. If two PRs hit the same Pattern-Key with different root causes, the keys are too coarse — refine them rather than letting them merge.self-healing — the interactive skill this mirrors; same file format, same verify disciplineself-improvement-ci — receives heal Handoff blocks; proposes promotion to memory filessimplify-and-harden-ci — quality pass after heals stabilize the PRverify-gate — the interactive verify gate; self-healing-ci's verify is the CI workflow re-runreferences/workflow-example.md — gh-aw workflow templates and authoring notesdevelopment
Implementation + audit loop using parallel agent teams with structured simplify, harden, and document passes. Spawns implementation agents to do the work, then audit agents to find complexity, security gaps, and spec deviations, then loops until code compiles cleanly, all tests pass, and auditors find zero issues or the loop cap is reached. Use when: implementing features from a spec or plan, hardening existing code, fixing a batch of issues, or any multi-file task that benefits from a build-verify-fix cycle.
tools
Active runtime recovery for coding agents: when something breaks mid-task, diagnose the root cause, write a fix, VERIFY by re-running the broken thing, then file a `HEAL-` entry to `.learnings/HEALS.md` with proof. Use whenever a command, test, build, or lint fails or exits non-zero; on missing tooling, dependency/lockfile mismatch, wrong runtime version, venv or permission errors, port conflicts, dirty git state, or a missing `.env`; when the agent needs a helper or one-off script that doesn't exist yet; when an external API, tool, or MCP errors or rate-limits; or when a test flakes. Search `HEALS.md` by `Pattern-Key` first — most heals are recurrences, so increment `Recurrence-Count` instead of duplicating. Verify is mandatory: mark `pending-verify` honestly if sandboxed, `abandoned` if the fix can't be made to work. Pairs with `self-improvement` (which promotes recurring heals to durable memory) but owns the verify-before-persist discipline self-improvement doesn't.
development
Control-plane workflow for coordinating multi-agent, multi-session project work from a single Codex, GitHub Copilot, or agent-app control session. Use this skill whenever the user asks to orchestrate agents, create or steer worker sessions, run a workflow-like effort, fan out audits/research/migrations, coordinate parallel implementation streams, monitor other project sessions, or compare this control-session pattern to Claude Code dynamic workflows. This skill is especially relevant when the current session can spawn persistent project sessions and those sessions can spawn their own subagents, creating a two-level orchestration hierarchy.
tools
Active runtime recovery for coding agents: when something breaks mid-task, diagnose the root cause, write a fix, VERIFY by re-running the broken thing, then file a `HEAL-` entry to `.learnings/HEALS.md` with proof. Use whenever a command, test, build, or lint fails or exits non-zero; on missing tooling, dependency/lockfile mismatch, wrong runtime version, venv or permission errors, port conflicts, dirty git state, or a missing `.env`; when the agent needs a helper or one-off script that doesn't exist yet; when an external API, tool, or MCP errors or rate-limits; or when a test flakes. Search `HEALS.md` by `Pattern-Key` first — most heals are recurrences, so increment `Recurrence-Count` instead of duplicating. Verify is mandatory: mark `pending-verify` honestly if sandboxed, `abandoned` if the fix can't be made to work. Pairs with `self-improvement` (which promotes recurring heals to durable memory) but owns the verify-before-persist discipline self-improvement doesn't.