skills/harness-fix/SKILL.md
Plan and execute bug fixes, regression repair, incident remediation, flaky-path debugging, and evidence-driven investigation in repositories that already use Harness Engineering control plane artifacts such as `docs/PLANS.md`, `docs/exec-plans/active`, `docs/OBSERVABILITY.md`, `docs/generated/harness-manifest.md`, and repo-local `python3 scripts/check_harness.py`. Use when the user reports a broken behavior and the agent should rewrite the bug report, run harness preflight, reproduce or bound the failure, create or update a fix plan, apply the smallest justified repair, add regression protection, sync `docs/PLANS.md`, and archive the plan deterministically. Especially relevant for prompts such as `$harness-fix`, `修复...`, `这个报错`, `为什么坏了`, or `排查回归`.
npx skillsauth add Refinex-Space/Refinex-Skills harness-fixInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Diagnose bugs, repair regressions, remediate incidents, and investigate flaky paths within a harnessed repository. While harness-feat builds new things, harness-fix diagnoses and repairs broken things — and it does so with the discipline of a scientific investigation, not a frantic code-patching spree.
This is a low-medium freedom skill. The seven-step diagnosis protocol is rigid (preflight → bug brief → reproduce → isolate root cause → minimal fix → verify → archive) and must be followed in order. The agent has latitude in investigation techniques — choosing which debugging tools to apply, how to narrow the search space, what kind of reproduction test to write — but the sequence of phases is non-negotiable, and skipping reproduction or root cause isolation is never acceptable.
Announce at start: I'm using harness-fix to reproduce, isolate, and repair this failure.
NO FIX WITHOUT REPRODUCTION EVIDENCE
NO ROOT CAUSE CLAIM WITHOUT EXPLICIT EVIDENCE
THREE FAILED FIX ATTEMPTS MEAN STOP AND REPLAN
If any of these rules is violated, the workflow has already degraded into guesswork.
Harness-fix selectively absorbs four behavioral disciplines into debugging work:
These rules keep debugging from turning into guess-driven patching.
When an agent encounters a bug report, the natural impulse is to read the error message, guess the cause, and start editing code. This is the debugging equivalent of cowboy coding, and it produces the same outcome: changes that appear to work but mask the real problem, introduce new failure modes, or accumulate into an unrecoverable mess.
The primary risk this skill fights is symptom whack-a-mole — the agent patches the immediate symptom without understanding root cause. OpenAI's team discovered a critical insight: "when something failed, the fix was almost never 'try harder.'" The solution is always to change the approach — better reproduction, different isolation technique, more precise diagnosis — never to keep hammering at the same code hoping it works. This skill encodes that principle by requiring evidence at every stage before proceeding to the next.
A second risk is fix sprawl: the agent makes a change, it doesn't fix the bug, so it makes another change, and another, losing track of what was modified and why. After five rounds of this, the codebase has accumulated random mutations with no clear connection to the original problem. The fix scope guard (see below) prevents this by limiting the number of change attempts before requiring a structured replanning.
A third risk is phantom fixes: the agent makes a change that coincidentally makes the symptom disappear without actually addressing the root cause. Without reproduction evidence before AND verification evidence after, there's no way to distinguish a real fix from a phantom one. The reproduction → fix → verification sandwich is the only reliable way to confirm a bug is genuinely fixed.
All four Harness skills (harness-bootstrap, harness-garden, harness-feat, harness-fix) use these terms consistently:
| Term | Definition | | ------------------ | ------------------------------------------------------------------------------------------ | | Control plane | The set of AGENTS.md, docs/, scripts/, and manifest that guide agent work | | Execution plan | A versioned, checkpointed plan in docs/exec-plans/ | | Managed doc | A doc created and maintained under harness lifecycle | | Unmanaged doc | An existing team doc the harness must NOT overwrite | | Manifest | Machine-readable inventory of control plane artifacts (docs/generated/harness-manifest.md) | | Preflight | Verification check run before starting any task (scripts/check_harness.py) | | Drift | When control plane artifacts no longer reflect repo reality |
Before touching any code, orient yourself within the control plane and establish the "known broken" baseline. This step is shared with harness-feat but has a critical difference: in a fix scenario, the test suite may already be failing — and you need to know exactly which tests fail BEFORE you start, so you can distinguish pre-existing failures from your changes.
Read these files in order:
python3 scripts/check_harness.py
If the check fails, note the failures. Unlike harness-feat (which stops on a broken control plane), harness-fix proceeds with caution — the bug you're fixing might be the reason the control plane is unhealthy. Note the preflight failures in your fix plan.
If check_harness.py does not exist, the repo has no harness — suggest running harness-bootstrap first and stop.
Run the existing test suite:
<test-command from docs/OBSERVABILITY.md>
Record:
This baseline serves two purposes: (1) it may contain the reproduction of the bug you're about to fix, and (2) it establishes which failures are pre-existing so you don't inadvertently claim credit for fixing them or introduce new failures without noticing.
Read docs/PLANS.md. If there's an active plan touching the same area as the bug:
Rewrite the user's bug report into a structured, precise format. Users describe bugs informally — "it's broken", "this doesn't work", "I'm getting an error." The bug brief transforms this into actionable intelligence.
Read references/bug-brief-template.md for the complete template.
| Field | Description | | ------------------ | -------------------------------------------------------------------------- | | Symptom | Exact observable behavior — the error message, the wrong output, the crash | | Expected | What should happen instead | | Reproduction | Steps to trigger the bug (may be incomplete — will be verified in Step 3) | | Affected scope | Which modules, files, or components are involved | | Severity | Blocking / Degraded / Cosmetic (see severity table below) | | Type | Regression / New bug / Flaky / Environment-specific | | Assumptions | Environment details or invariants currently assumed to hold |
| Severity | Definition | Response | | ------------ | ------------------------------------------------------------------------ | ------------------------------ | | Blocking | Core functionality broken, no workaround | Fix immediately, full protocol | | Degraded | Feature works but with errors, performance issues, or edge-case failures | Fix with full protocol | | Cosmetic | Visual or minor behavioral issue, functionality intact | Lightweight fix protocol |
The type determines which investigation techniques are most effective:
| Type | Investigation approach | | ------------------------ | -------------------------------------------------------------- | | Regression | Git bisect to find the introducing commit, then analyze change | | New bug | Scientific debugging: hypothesis → experiment cycle | | Flaky | Non-deterministic path — see the flaky test protocol below | | Environment-specific | Compare environments, isolate the differentiating factor |
Present the structured bug brief. Ask the user to confirm or correct. If the report is too vague to create a meaningful brief (no error message, no reproduction steps, no description of expected behavior), ask specific questions before proceeding.
Reproduction is the foundation of the entire fix. Without a reliable way to trigger the bug, you cannot verify that any fix actually works — you can only hope. Hope is not engineering.
The ideal reproduction is a test that:
Reproduction test written: test_name
Result: FAIL (expected — confirms bug exists)
Error: <exact error output>
If the bug can't be captured in a unit or integration test (e.g., it's a timing issue, a UI rendering problem, or requires a running server), document the manual reproduction steps precisely enough that another agent could follow them:
Manual reproduction:
1. Start the server: <command>
2. Navigate to: <URL>
3. Click: <element>
4. Observe: <symptom>
If you cannot reproduce the bug:
Do NOT proceed to fix without reproduction evidence. A fix without reproduction is a guess — and guesses have a poor track record of actually solving problems.
The one exception: if the bug is visible in the test baseline from Step 1.3 (a test is already failing), that IS your reproduction evidence. Note which test captures the bug and proceed.
Once reproduction succeeds, create a fix plan in docs/exec-plans/active/ using the naming convention YYYY-MM-DD-fix-short-description.md. Read references/fix-plan-template.md for the template.
The fix plan differs from a feature execution plan in that it includes:
Register the plan in docs/PLANS.md and commit:
git add docs/exec-plans/active/YYYY-MM-DD-fix-*.md docs/PLANS.md
git commit -m "plan(harness): add fix plan for <short-description>"
This is where diagnosis separates from guessing. The goal is to narrow the failure to the smallest possible scope and identify the exact mechanism — not just WHAT is failing, but WHY.
Read references/root-cause-techniques.md for the complete toolkit.
Select based on bug type and available information:
| Situation | Primary technique | Fallback | | ------------------------------------------ | ------------------------ | ------------------------ | | Bug appeared after a known change | Git bisect | Manual commit review | | Bug in logic/computation | Scientific debugging | Code path tracing | | Bug in data flow | Input/output tracing | Delta debugging | | Bug in integration between components | Interface contract check | Dependency analysis | | Intermittent failure | Flaky test protocol | Statistical reproduction | | Bug with clear error message + stack trace | Stack trace analysis | Breakpoint debugging |
The reference file covers each technique in full. Here's the essential logic:
Scientific debugging: Observe → Hypothesize → Predict → Experiment → Conclude. Form a specific, falsifiable hypothesis. Test one variable at a time. Record each cycle in the hypothesis log (references/hypothesis-log-template.md). Refuted hypotheses are progress — they narrow the search space.
Git bisect: binary search through commits between a known-good and known-bad to find the introducing commit in $O(\log N)$ steps. Read the introducing commit to understand WHY it broke.
Five Whys: ask "why" iteratively to trace the causal chain from symptom to root cause. Stop when you reach a cause that's directly actionable.
Track hypothesis cycles and fix attempts explicitly.
At that point, do one of these before continuing:
Update the fix plan with:
## Root Cause
**Mechanism**: <exact explanation of why the bug occurs>
**Introduced by**: <commit hash, change, or condition>
**Why it wasn't caught**: <missing test, uncovered edge case, etc.>
The root cause documentation serves two purposes: it guides the minimal fix (Step 5), and it helps future agents understand the failure mode when reading the archived plan.
Apply the smallest change that addresses the root cause — not the symptom, but the underlying mechanism identified in Step 4. Minimality is a design constraint, not a suggestion: every line of change beyond the minimum creates risk of introducing new bugs and makes the fix harder to review and understand.
Before writing code, describe the fix in the plan:
## Fix
**Strategy**: <what will change and why>
**Files**: <list of files to modify>
**Risk**: <what could go wrong with this approach>
docs/exec-plans/tech-debt-tracker.mdgit add <changed-files>
git commit -m "fix(<scope>): <description>
refs plan: <plan-filename> step 5
Root cause: <one-line root cause summary>"
The fix scope guard prevents fix sprawl. Track these metrics during Step 5:
| Metric | Threshold | Action when exceeded | | ------------------- | ----------------- | ------------------------------------------------------------------ | | Files modified | ≤ 5 | Stop, replan — the fix may be in the wrong layer | | Fix attempts | ≤ 3 | Stop, write hypothesis log, treat this as a replanning event | | Unrelated changes | 0 | Revert unrelated changes immediately | | Lines changed (net) | Context-dependent | Larger than expected? Verify you're fixing root cause, not symptom |
If you hit any threshold, stop and reassess:
This step confirms the fix works and protects against the bug recurring. Both halves are mandatory — verification without regression protection leaves the codebase vulnerable to the same failure mode reappearing.
<test-command targeting the reproduction test>
The reproduction test from Step 3 must now PASS. If it still fails, the fix is incorrect — go back to Step 4 and re-examine the root cause.
If the reproduction test was manual (not an automated test), convert it to a permanent test now:
Regression test: test_<descriptive_name>
Purpose: Prevents recurrence of <bug brief title>
Covers: <the specific failure mode>
The regression test should:
test_auth_rejects_expired_token, not test_fix_token_check)Read references/regression-test-patterns.md for patterns by bug type.
<test-command from docs/OBSERVABILITY.md>
Compare with the baseline from Step 1.3:
Apply harness-verify discipline here: the fix is not complete until the reproduction, regression test, and suite results are freshly re-run and inspected.
<lint-command from docs/OBSERVABILITY.md>
Fix any violations introduced by the fix. Do not fix pre-existing violations.
Review your own diff (git diff <baseline-commit>..HEAD). Check for: changes outside fix scope (fix sprawl), debug code left in, accidentally broad changes, and that the diff matches the root cause description in the plan.
If the fix changed behavior documented in AGENTS.md, ARCHITECTURE.md, or OBSERVABILITY.md, update those files. Commit control plane updates separately:
git commit -m "docs(harness): update control plane after <fix-description>"
The fix is done when the plan is archived, not when the code is changed. This step is shared with harness-feat.
Add a completion summary with: completion date, root cause (one-line), fix summary (one-line), regression test name, PASS/FAIL status, and a paragraph covering the bug, root cause, fix, and any broader implications.
git mv docs/exec-plans/active/YYYY-MM-DD-fix-*.md \
docs/exec-plans/completed/YYYY-MM-DD-fix-*.md
Update docs/PLANS.md: move the entry from "Active Plans" to "Completed Plans."
If the bug reveals a systemic pattern, add an entry to docs/exec-plans/tech-debt-tracker.md with severity, area, description, and detection source.
git add docs/exec-plans/ docs/PLANS.md
git commit -m "chore(harness): archive fix plan for <short-description>"
Report to the user:
=== Fix Complete ===
Bug: <symptom from bug brief>
Root cause: <mechanism>
Fix: <what changed>
Plan: docs/exec-plans/completed/YYYY-MM-DD-fix-short-description.md
Regression test: <test name>
Tests: <baseline count> → <new count> (added <diff>)
Previously failing: <N tests> → Now passing: <M tests>
Commits:
1. plan(harness): add fix plan for <short-description>
2. fix(<scope>): <fix description>
3. chore(harness): archive fix plan for <short-description>
Flaky tests (non-deterministic failures) require a different diagnosis path. Suspect a flake when: the test passes sometimes and fails others on the same code, fails in CI but passes locally, or started failing without a relevant code change.
Diagnosis approach:
Investigation doesn't always converge. These rules prevent infinite debugging loops.
| Trigger | Action | | ------------------------------------------------------ | ---------------------------------------------------- | | 3 hypothesis cycles with no progress | Write a structured hypothesis log, present to user | | Root cause spans multiple modules you don't understand | Ask user for domain guidance | | Reproduction requires infrastructure you can't access | Document what you need, ask user to set up | | Fix requires architectural change beyond bug scope | File as tech debt, propose interim workaround | | Total diagnosis time exceeds reasonable bounds | Summarize findings so far, present partial diagnosis |
When escalating, provide: the symptom, number of hypothesis cycles attempted, confirmed findings, ruled-out hypotheses, remaining untested hypotheses with reasons they're blocked, and a specific recommendation for next steps. This ensures no investigation effort is wasted — even an incomplete diagnosis narrows the search space for whoever picks up next.
These rules encode the lessons from systematic debugging research and real-world agent failure modes.
Never fix without reproduction evidence. A fix without reproduction is a guess. Even if the guess is correct, there's no way to verify it, and no way to protect against regression. The reproduction test is the foundation everything else rests on.
Never fix by reverting without understanding. Reverting to a previous working state removes the symptom but doesn't explain WHY the regression occurred. Without understanding, the same mistake will be made again. Revert is a valid temporary measure, but the investigation must continue to root cause.
Minimal fix only. Do not refactor, do not add features, do not "improve" adjacent code during a fix. Each of these activities has its own risk profile and deserves its own plan. Mixing them with a fix makes the fix harder to review, harder to revert, and harder to understand.
Root cause in the plan, not just the patch. The fix plan must document the root cause mechanism, not just the code change. "Changed line 42 from X to Y" is a patch description, not a root cause. "The comparison used < instead of <= because the boundary condition was added in commit abc123 without updating the existing check" is a root cause.
Regression test is mandatory. Every fix must leave behind a test that will fail if the bug recurs. The test is the permanent record of the failure mode — it protects the codebase after the humans and agents who understand the bug have moved on.
Respect the fix scope guard. If you're modifying more than 5 files or making more than 3 fix attempts, stop and reassess. Either the root cause analysis is wrong, or the fix belongs at a different level of abstraction.
Log everything even if diagnosis fails. A hypothesis that was tested and refuted is valuable information. Future agents who encounter the same bug will benefit from knowing what was already tried and eliminated. The hypothesis log is as important as the fix itself.
Distinguish pre-existing failures from new ones. The "known broken" baseline from Step 1.3 is your reference. If a test was failing before you started, you don't own it. If a test starts failing during your fix, you do.
Scale the ceremony to the severity:
| Severity | Bug brief | Reproduction | Root cause | Fix plan | Regression test | | -------- | --------- | ------------ | ------------- | ----------- | --------------- | | Blocking | Full | Required | Full protocol | Full | Required | | Degraded | Full | Required | Full protocol | Full | Required | | Cosmetic | Brief | Lightweight | Targeted | Lightweight | Recommended |
Even cosmetic fixes get a plan file — it creates the audit trail, and the naming convention (YYYY-MM-DD-fix-*.md) makes fix plans distinguishable from feature plans in the archive.
| File | When to read |
| ---------------------------------------- | ----------------------------------------- |
| references/bug-brief-template.md | Step 2: structuring the bug report |
| references/root-cause-techniques.md | Step 4: choosing investigation techniques |
| references/fix-plan-template.md | Step 3: creating the fix plan |
| references/regression-test-patterns.md | Step 6: writing the regression test |
| references/hypothesis-log-template.md | Step 4: recording investigation cycles |
development
Deep initialization of project AGENTS.md hierarchy and control plane for AI coding agents. Use this skill whenever the user wants to set up, initialize, bootstrap, or create AGENTS.md / CLAUDE.md files for their project, or when they mention "init-deep", "harness setup", "control plane", "agent context", "project initialization for agents", or want to make their codebase agent-ready. Also trigger when a user says things like "set up my repo for Claude Code", "make this project work better with agents", "create agent instructions", "bootstrap harness", or "initialize agent docs". This skill handles both existing large codebases (where hierarchical, module-scoped AGENTS.md files are needed) and new/small projects (where brainstorming with the user comes first). Do NOT use this skill for routine code changes, bug fixes, or general documentation — it is specifically for creating the structured agent control plane.
development
Detect and fix drift in project AGENTS.md files and agent control plane. Use this skill whenever the user wants to audit, recalibrate, refresh, update, or fix their existing AGENTS.md files, or when they mention "drift", "stale AGENTS.md", "outdated agent instructions", "recalibrate", "sync agents", "audit control plane", "AGENTS.md is wrong/old/broken", or when they suspect their agent harness has fallen out of sync with the codebase. Also trigger when a user says things like "my agents keep making wrong assumptions", "Claude doesn't understand the new structure", "we refactored but the AGENTS.md is old", "check if my AGENTS.md is still accurate", or "update my agent docs". This skill is the companion to init-deep — init-deep creates the control plane from scratch, drift-doctor maintains it over time. Do NOT use for initial creation of AGENTS.md (use init-deep instead). Do NOT use for general code review or documentation updates unrelated to agent context.
development
Use when adding, fixing, reviewing, or generating code comments, docstrings, Javadoc, JSDoc/TSDoc, rustdoc, SQL comments, or documentation comments for source, markup, configuration, or database files.
development
Enforce production-grade Java development standards when writing, reviewing, or architecting Java code. Covers commenting, core Java idioms (Stream, collections, concurrency, generics), 23 GoF design patterns, SonarQube/Alibaba p3c/Lombok rules, Spring Boot MVC structure, Spring Cloud DDD microservices, MyBatis/JPA/transaction management, exception handling, logging, REST API design, testing, and security. Trigger whenever the user writes Java code, reviews Java code, designs a Spring Boot or Spring Cloud project, implements a design pattern, fixes code smells, discusses architecture, or asks about Java best practices. Also trigger when Java code is pasted for feedback or the user asks about package structure, DTO/VO/PO conventions, or coding standards.