src/autoskillit/skills_extended/resolve-failures/SKILL.md
Failure resolution executor. ALWAYS invoke this skill when instructed to fix test failures in a worktree. Do not read test output or edit code directly — use this skill first to load the failure resolution workflow.
npx skillsauth add talont-org/autoskillit resolve-failuresInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Fix test failures in a worktree implemented by /autoskillit:implement-worktree-no-merge, leaving the worktree green and unmerged for the orchestrator's merge gate.
run_skill after ci_watch reports a CI failurerun_skill after test_check returns FAILrun_skill when merge_worktree returns dirty_tree{worktree_path} {plan_path} {base_branch}{ci_conclusion} {ci_failed_jobs} {diagnosis_path}merge_worktree after verify passes.verdict output token so the recipe can route correctly (never silently re-pushes).NEVER:
merge_worktree or any other mechanismmerge_worktree MCP tool{{AUTOSKILLIT_TEMP}}/resolve-failures/ directoryfixes_applied=0 when CI has identified a specific failing testALWAYS:
Edit call on any file, ensure you have issued a Read on that file earlier in this session. Claude Code rejects Edit on unread files — the retry wastes a full API turn at current context size. If you are uncertain whether a file was read, issue a targeted Read (offset + limit to the region you plan to edit) rather than risk an error.python3 or other interpreters in the worktree, verify CWD is the worktree root. Use absolute paths or explicit cd. Wrong-CWD errors waste a full API turn at current context size.limit parameter. Do not paginate files with sequential offset reads — read once completely. Use limit/offset only for targeted section reads of files you have already read in full.Flaky tests must always be resolved. A test that failed previously and now passes
is flaky by definition. Investigate timing dependencies, race conditions, insufficient
timeouts, resource contention under parallel execution (pytest-xdist), and
non-deterministic setup/teardown. Apply a stabilizing fix. Never classify a flaky test
as unfixable. Never emit ci_only_failure for a test that is merely non-deterministic.
When a test passes in isolation but fails in the full suite, check whether the test's
shared dependencies (service objects, registries, caches, middleware stacks) accumulate
state across calls — growing lists, maps, or registries rather than toggling flags,
unless accumulation is the intended behavior (e.g., event logs, audit trails). Calling
an inverse method (e.g., disable() to undo enable()) does not reset
accumulation-based state if the framework appends rather than toggles; every call to
either method appends a new entry. The fix for accumulation-based leakage is full reset:
clear the collection, then re-apply the baseline state — not inverse operations.
When context is exhausted mid-execution, edits may be on disk but not committed.
The recipe routes to on_context_limit (typically test), bypassing the normal
commit protocol during the fix loop.
Before every test run and before emitting structured output tokens:
git -C {worktree_path} status --porcelaingit -C {worktree_path} add -A && git -C {worktree_path} commit -m "fix: commit pending changes before context limit"This ensures that even if context exhaustion interrupts the fix loop, all applied edits are committed and the downstream merge gate receives a clean worktree.
Read the configured test command(s) from .autoskillit/config.yaml: check test_check.commands first (ordered list of commands); fall back to test_check.command (single command). Use the test_check MCP tool (which runs all configured commands automatically and returns a single pass/fail).
Parse positional args using path detection: scan all tokens after the skill name.
Positional args (in order):
1. worktree_path — path-like token (starts with /, ./, or .autoskillit/)
2. plan_path — path-like token
3. base_branch — non-path token
4. ci_conclusion — optional: "failure", "success", or absent/"-"
5. ci_failed_jobs — optional: JSON array of job names or absent/"-"
6. diagnosis_path — optional: path-like token to the diagnose-ci report or absent/"-"
Scanning rules: use path-detection (find path-like tokens for positions 1, 2,
and 6); pick up remaining non-path non-"-" tokens for base_branch,
ci_conclusion, and ci_failed_jobs. The last path-like token after
plan_path that ends with .md → diagnosis_path. Ignore any non-path
tokens that appear before the path arguments. If fewer than two path-like
tokens are found, abort with a clear error and the correct format:
/autoskillit:resolve-failures <worktree_path> <plan_path> <base_branch>
Verify worktree exists and is a valid git worktree
Verify plan file exists and is readable
If diagnosis_path is provided and exists, note it for Step 2a below
Path-existence guard: Before issuing a Read call on a path that is not guaranteed to
exist (e.g., plan file arguments, {{AUTOSKILLIT_TEMP}}/investigate/ reports, external file references), use
Glob or ls to confirm the path exists first. This prevents ENOENT errors that cascade into
sibling parallel-call cancellations.
Check for development environment in worktree, recreate if missing. Use the project's configured worktree_setup.command, or: cd "${worktree_path}" && task install-worktree
git -C {worktree_path} status --porcelaingit -C {worktree_path} add -Agit -C {worktree_path} commit -m "chore: commit auto-generated files"git log --oneline $(git merge-base HEAD origin/{base_branch})..HEADgit diff --stat $(git merge-base HEAD origin/{base_branch})..HEADIf diagnosis_path was provided and the file exists:
diagnosis_path and read its contentfailure_subtype = {value}failure_subtype value (e.g., flaky, deterministic, timing_race, etc.){failure_subtype} for use in the Verdict Decision TreeIf diagnosis_path is absent or the file does not exist:
{failure_subtype} = unknownCheck if the project has artifact generation steps that CI runs before tests:
.github/workflows/tests.yml (or the CI workflow file) if it existsgenerate_hooks_json, recipes render, code generation)This ensures local test results match CI behavior. Skip if no pre-test generation steps exist.
test_check MCP tool — not via Bash or run_cmd:
test_check(worktree_path="{worktree_path}")
When test suites take 5–7 minutes, the Bash tool auto-backgrounds the command
and the LLM enters a polling cascade of 20+ API calls to detect completion.
test_check blocks synchronously and returns passed: true/false in a single call.{local_result}: PASS (passed: true) or FAIL (passed: false)This rule takes precedence over the decision table below.
If at ANY point during this skill's execution a code change was committed
(i.e., the fix loop was entered and produced at least one commit) AND the final
test_check result is PASS:
→ verdict = real_fix — unconditionally, regardless of failure_subtype.
The decision table below applies ONLY when no fix was committed during this
invocation. Once a fix is applied and tests pass, the verdict is real_fix
and the table is never consulted.
Applies ONLY when no fix was applied (fixes_applied == 0). If the fix loop was entered
and a commit was made, skip this table — verdict is already real_fix per Step 2c.
Using {local_result} from Step 2 and {failure_subtype} from Step 2a, determine {verdict}:
| Local result | failure_subtype | Verdict |
|---|---|---|
| PASS | flaky or timing_race | flake_suspected |
| PASS | deterministic | ci_only_failure |
| PASS | fixture or import | flake_suspected |
| PASS | env or unknown | flake_suspected |
Note on already_green: This verdict is reserved for the pre_resolve_rebase
re-entry path — when a sibling pipeline's fix has already landed on integration and
the worktree was rebased before this skill ran. In that case, the orchestrator's
pre_resolve_rebase step has already pulled the fix; the re-run of diagnose-ci +
resolve-failures will now emit real_fix or another verdict. already_green is
not emitted by this skill's primary workflow.
If local tests PASS (no fix needed): go to Step 2.5 (Validate CI Resolution) before proceeding to Step 4 — the CI-truth gate may redirect to the fix loop for flakiness investigation even when local tests pass.
If local tests FAIL: enter the fix loop.
Tests passed locally. Before reporting success, check whether the skill was invoked in response to a CI failure.
CI is the source of truth. A local pass does not resolve a CI failure — it means the failure could not be reproduced locally, which is a flaky-test signal.
If diagnosis_path is absent (or "-"), or ci_conclusion is absent (or is not
"failure"): proceed to Step 4 (no active CI failure context to enforce).
If diagnosis_path is present AND ci_conclusion == "failure":
a. Read the diagnosis file at diagnosis_path
b. Extract the failing test name(s) from the "## Log Excerpt" or the failure_type
classification in the diagnosis
c. If failure_type == "test" (one or more named test failures identified):
verdict = real_fix
(per Step 2c override). Do NOT fall back to the Step 2d table — the fix
resolves the CI failure regardless of the original failure_subtype.
d. If failure_type is not "test" (e.g., "lint", "build") and tests pass locally:pre-commit run --all-files first
b. Run git -C {worktree_path} status --porcelain to capture the full set of modified files, including any auto-fixed by hooks
c. Stage and commit: git -C {worktree_path} add -A && git -C {worktree_path} commit -m "fix: {what was wrong and why}"
d. Run git -C {worktree_path} status --porcelain again to verify the tree is clean; if any files remain dirty, stage and commit them too{{AUTOSKILLIT_TEMP}}/resolve-failures/ (relative to
the current working directory) to satisfy the write_behavior contract
(generates an Edit/Write call that proves work was done):
{{AUTOSKILLIT_TEMP}}/resolve-failures/fix_log_{iteration}_{ts}.mdtest_check MCP tool:
test_check(worktree_path="{worktree_path}")
Do NOT re-run via Bash — see Step 2 rationale.
After receiving the result, extract and retain ONLY:
verdict = real_fix (per Step 2c override — do not re-evaluate Step 2d) → Step 4; Red and < 3 iterations → repeat; Red and >= 3 → Step 5Tests are green. Report and exit — do NOT merge.
Output to terminal:
{verdict}Then emit the structured output tokens on their own lines so the pipeline's
on_result: verdict routing and write_behavior: conditional contract can evaluate them:
IMPORTANT: Emit the tokens as literal plain text with no markdown formatting. The gate performs a regex match — decorators cause match failure.
verdict = {verdict}
fixes_applied = {N}
Where:
{verdict} is one of: real_fix, flake_suspected, ci_only_failure{N} is the number of fix iterations performed (0 for flake_suspected or ci_only_failure verdicts, ≥1 for real_fix)Return control to the orchestrator. The recipe's on_result: routing dispatches
on verdict:
real_fix → re_push (fix landed, push to remote)flake_suspected → re_push (retry via CI, bounded by retries: 2 / on_exhausted: release_issue_failure)ci_only_failure → release_issue_failure (human escalation)Invariant: ci_only_failure is NEVER emitted when fixes_applied >= 1. If a fix was
committed during this invocation and tests pass, the verdict is always real_fix.
/autoskillit:rectifydevelopment
Generate YAML recipes for .autoskillit/recipes/. Use when user says "make script skill", "generate script", "script a workflow", "write a script", "create a script", "new recipe", "write a pipeline", or when loaded by other skills for script formatting.
data-ai
Create Uncertainty Representation visualization planning spec showing error bar definitions, distribution-aware alternatives, and multi-seed variance protocols. Statistical lens answering "How is uncertainty honestly represented?"
data-ai
Create Temporal Dynamics visualization planning spec showing axis scaling (linear vs log), smoothing disclosure, epoch/step alignment, run aggregation (mean + variance bands), early-stopping markers, and wall-clock vs step-count x-axis. Temporal lens answering "Are training dynamics shown clearly and honestly?"
data-ai
Create Narrative Story Arc visualization planning spec showing visual consistency across the report (same color = same model everywhere), logical figure progression, redundant figure detection, and narrative dependency between figures. Narrative lens answering "Do the figures tell a coherent story across the report?"