skills/ci/SKILL.md
Audit CI gates, strengthen weak coverage, then drive green. Dagger owns the canonical pipeline; missing Dagger is auto-scaffolded. Acts directly on mechanical fixes and never returns red without structured diagnosis. Use when: "run ci", "check ci", "fix ci", "audit ci", "is ci passing", "run the gates", "dagger check", "why is ci failing", "strengthen ci", "tighten ci", "ci is red", "gates failing". Trigger: /ci, /gates.
npx skillsauth add phrazzld/spellbook ciInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Confidence in correctness. CI is load-bearing: a green CI is a claim about code correctness. If CI is weak (no type check, shallow tests, no coverage floor), green means nothing. So this skill audits first, then runs.
Stops at green CI. Does not review code semantics (→ /code-review), does
not address review comments (→ /deliver --polish-only), does not ship.
--audit-only: produce audit report and gap proposals; do not run gates.--run-only: skip audit, just drive gates green.dagger.json is a gap
the skill closes itself, not a blocker that halts work. Scaffold a
TypeScript Dagger module, fold every existing gate into check(),
thin the CI provider (GHA, CircleCI, etc.) to a single dagger call check
step, and update pre-push hooks to match. Raw npm run lint / pytest
/ go test etc. are what GHA replaced, not what replaces GHA — they
bypass the hermetic-container contract. CI providers must be thin
wrappers that shell out to dagger call check, never the authoritative
gate owner. Agent-first, local-first, provider-independent.Delegation floor applies: probe the roster first; dispatch two or more
providers for substantive work; direct solo only for mechanical, emergency,
user-forbidden, or fewer-than-two-providers cases. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use lanes for audit, failure diagnosis, and gate-strengthening judgment; the lead may run exact validation commands and mechanical repairs after deciding from the evidence.
See references/self-heal.md for the fix-vs-escalate decision. Auto-fix
format/lint drift, regenerate lockfiles when deterministic, and retry flakes
within the bounded policy. Algorithm and logic failures escalate.
--run-only)Read references/audit.md for the full audit rubric. Inventory in parallel.
Pipeline presence is the first gate. If Dagger is absent, scaffold it
inline — do not stop, do not wait for approval. Scaffolding Dagger is
mechanical (dagger init --sdk=typescript --source=.dagger, fold existing
gates into check(), thin the GHA/CircleCI provider to a single
dagger call check step). The skill owns this.
dagger.json / .dagger/ exists?
Entrypoint reachable? (dagger functions lists check?) Missing = HIGH,
blocks green until scaffolded.dagger call <func> only. Inline npm run X / pytest / raw bash in
workflow YAML = finding (pipeline lives in two places, drifts)..githooks/pre-push pattern).Emit a structured audit report:
## CI Audit
| Concern | Status | Severity | Proposed fix |
|----------------|--------|----------|---------------------------------|
| lint | ok | - | - |
| type-check | gap | high | Add mypy strict to dagger check |
| coverage floor | weak | med | Raise floor from 40% → 70% |
| secrets scan | gap | high | Add gitleaks gate |
For each gap, apply the mechanical remediation directly. Do NOT emit "proposals" awaiting approval. Mechanical strengthenings include: adding a missing lint/typecheck/coverage/secrets gate, wiring new scripts into the bash-syntax step, consolidating duplicate workflows, raising thresholds upward when the current code already passes a higher bar, and scaffolding Dagger. Escalate only when the strengthening would disable a currently-green test, materially change scope, or encode a product decision the code alone cannot resolve.
If audit finds no gaps worth fixing, say so and proceed.
--audit-only)dagger call check. No fallback. If
Dagger is absent, Phase 1 failed to block — abort and re-audit.references/self-heal.md:
Final pass of dagger call check after any fixes. Green or bust. If any
gate was strengthened in Phase 1, the full pipeline must pass under the
new thresholds before the skill returns clean.
/code-review/shape/deliver --polish-only/deploy/qaA green gate proves only the commands that actually ran.
When a diff adds or materially changes an executable path — script
entrypoint, CLI, package.json / make target, Dagger function,
migration, runner, responder, or job entrypoint — the /ci report must:
Helper fixtures, unit coverage, and adjacent lanes do not count as runtime verification unless they invoke that exact path.
pytest, eslint, npm run X) even
when "faster" or "the pipeline isn't set up yet." Gates run via
dagger call exclusively; raw shell bypasses the hermetic-container
contract that makes green meaningful.dagger call ..., Dagger is being undermined.@skip-ing it. A
failing test is either a bug in the code or a bug in the test —
diagnose, don't suppress.Any / any or adding
# type: ignore. Escalate — this is a logic/contract decision.Report:
## /ci Report
Audit: 2 gaps found (type-check missing, coverage floor 40%).
→ Strengthened: type-check added.
→ Deferred: coverage floor raise (filed as backlog 0XX).
Run: 6 gates, 1 self-heal (ruff auto-fix), 0 escalations.
Final: green. dagger call check passes in 4m12s.
On failure:
## /ci Report — RED
Gate: test-python
Failure: tests/widget/test_reducer.py::test_merge_conflict line 42
AssertionError: expected {'a': 1}, got {'a': 1, 'b': 2}
Classification: logic failure (behavior change in reducer)
Action: escalated — human decision needed on reducer contract.
development
Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.
testing
Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.
data-ai
Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.
testing
Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.