Preamble (Core)

Status protocol — end every session with one of: DONE (evidence provided) · DONE_WITH_CONCERNS (list each) · BLOCKED (state what blocks you) · NEEDS_CONTEXT (state what you need).

Auto-advance — pipeline: THINK → PLAN → REVIEW → BUILD → VERIFY → RELEASE. Only human gate is spec approval at THINK. On DONE at other stages, print [STAGE] DONE -> advancing to [NEXT-STAGE] and invoke the next skill. On any non-DONE status at any stage, STOP.

Output directory — all artifacts go in docs/superomni/<kind>/<kind>-[branch]-[session]-[date].md. See CLAUDE.md for the full directory map.

TACIT-DENSE — before high-tacit decisions, classify D1 (domain expertise) · D2 (user-facing UX) · D3 (team culture) · D4 (novel pattern). On hit, output TACIT-DENSE [D#]: [question] — My default: [recommendation]. See reference for actions.

Anti-sycophancy — take a position on every significant question. Name flaws directly. No filler ("that's interesting", "you might consider", "that could work").

Telemetry (local only) — at session end, log bin/analytics-log. Nothing leaves the machine.

See preamble-ref.md for detailed protocols.

Harness Engineering

Goal: Design and maintain the agent harness — the scaffolding of environment, context, tools, constraints, evaluation gates, and feedback loops that determine how well agents perform.

"Engineers design the system. Agents execute." — OpenAI Harness Engineering

Iron Law

THE HARNESS IS THE PRODUCT. CODE IS ITS OUTPUT.

A well-designed harness produces reliable, high-quality agent output without requiring manual intervention on every task. When agents fail repeatedly, the correct response is to improve the harness — not to keep retrying the same prompt.

Core Principles (From OpenAI + Anthropic)

| Principle | What it means in superomni | |-----------|---------------------------| | Context is everything | Agents can only work with what they can see — keep docs, specs, and constraints in-repo and up-to-date | | Fewer, more expressive tools | Prefer composable skills over sprawling tool menus | | Evaluate relentlessly | Judgment gates must exist at every major transition point | | Signal-driven iteration | Agent failures are design signals — update the harness, not just the prompt | | Boring > clever | Prefer simple, composable patterns over novel abstractions | | Garbage collection | Periodically audit for drift, stale docs, and architectural decay |

Phase 1: Harness Inventory

Take stock of the current harness state:

# Skill count + structure
ls skills/ | wc -l
ls skills/

# Agent count
ls agents/

# Command count
ls commands/

# Preamble size (context overhead)
wc -l lib/preamble.md

# Skill template sizes (larger = more context pressure)
wc -l skills/*/SKILL.md.tmpl | sort -n | tail -10

# Validation status
npm test 2>/dev/null || bash lib/validate-skills.sh 2>/dev/null

# Recent harness changes
git log --oneline -10 -- lib/ skills/ agents/ commands/

# Any stale/out-of-date docs
find docs/ -name "*.md" -older /tmp 2>/dev/null | head -10

Document findings:

Total skills: ___
Total agents: ___
Total commands: ___
Preamble size (lines): ___
Validation status: PASS / FAIL
Largest skill (lines): ___

Phase 2: Context Window Audit

Context window pressure is one of the most common causes of agent degradation. Audit the harness context load:

2a. Preamble Efficiency Analysis

Review lib/preamble.md:

[ ] Every section earns its inclusion — remove dead protocols
[ ] No repetition with individual skill instructions
[ ] Status protocol is concise and actionable
[ ] Performance checkpoint is lightweight (3 questions max)
[ ] Telemetry block is at the end (lowest priority)

Target preamble size: < 150 lines. Flag if > 200 lines.

2b. Skill Bloat Detection

For each skill > 200 lines, ask:

Are all phases strictly necessary?
Are examples too verbose? (truncate to headers only)
Can any phase be replaced with a reference to another skill?
Is this skill actually used? (check telemetry)

2c. Progressive Disclosure Check

Does the framework expose only necessary context at each stage?

| Stage | Context needed | Currently loaded | |-------|---------------|-----------------| | Planning | spec, constraints | | | Implementation | plan, code context | | | Review | diff, standards | | | Debug | error, minimal repro | |

Good harnesses load context on demand, not all at once.

Phase 3: Tool Action Space Audit

Per Anthropic's principle: fewer, more expressive tools outperform large menus of narrow ones.

Review the agent's tool access:

# Check allowed-tools across all skills
grep "allowed-tools" skills/*/SKILL.md.tmpl

For each skill, evaluate:

[ ] Are the allowed tools the minimum needed for that skill?
[ ] Are any rarely-used tools creating confusion or unnecessary options?
[ ] Do composable skill combinations cover the same ground as single complex tools?

Recommended tool sets by role:

| Role | Minimal tool set | |------|----------------| | Planning / Brainstorming | Read, Write, Glob | | Implementation | Bash, Read, Write, Edit, Grep, Glob | | Review / Audit | Read, Grep, Glob | | Debug | Bash, Read, Grep, Glob |

Flag any skill whose tool set exceeds its role's minimum.

Phase 4: Evaluation Gate Audit

"Evaluation is the load-bearing part of agent harness design." — OpenAI/Anthropic harness engineering principles

Map every major workflow transition and verify an evaluation gate exists:

| Transition | Evaluation gate | Present? | |-----------|----------------|---------| | Spec → Plan | plan-review skill or planner-reviewer agent (planning mode) | | | Plan → Execution | dependency analysis wave plan | | | Execution Wave → Next Wave | wave verification step | | | Implementation → Review | code-review skill or planner-reviewer agent | | | Review → Ship | production-readiness skill | | | Ship → Done | verification skill | | | Sprint → Next Sprint | self-improvement skill | |

Any gap = harness deficiency. Add missing gates.

Phase 5: Feedback Loop Audit

A healthy harness converts agent failures into harness improvements:

Agent fails → Signal captured → Harness updated → Agent retries → Improvement
     ↑                                                                   |
     └───────────────────────────────────────────────────────────────────┘

Check the current feedback paths:

5a. Error → Harness Signal

When an agent fails a task repeatedly (3+ attempts), is there a defined process to:

[ ] Record the failure pattern (systematic-debugging skill)?
[ ] Identify the harness gap causing the failure?
[ ] Update the relevant skill, doc, or constraint?

5b. Performance → Improvement

Does the self-improvement skill output get consumed?

ls docs/superomni/improvements/ 2>/dev/null | head -5

[ ] Improvement reports exist?
[ ] Action items from last report applied to current sprint?
[ ] workflow skill reads improvement reports at sprint start?

5c. Documentation Garbage Collection

Is there a regular cadence for cleaning up:

[ ] Stale skill instructions that no longer match agent behavior?
[ ] Outdated docs that contradict current implementation?
[ ] Dead commands that are registered but never invoked?
[ ] Agent definitions whose Iron Laws conflict with updated skills?

Recommended: Schedule a harness GC pass after every 5 sprints.

Phase 6: Harness Health Score

Score the harness on each dimension (1-5):

| Dimension | Score | Key Finding | |-----------|-------|------------| | Context efficiency | /5 | | | Tool space minimalism | /5 | | | Evaluation gate coverage | /5 | | | Feedback loop completeness | /5 | | | Documentation freshness | /5 | |

Total: __ / 25

Scoring guide:

21-25 — World-class harness. Focus on new capabilities.
16-20 — Good harness. Address identified gaps.
11-15 — Fair harness. Significant drift or missing gates.
< 11 — Harness needs major refactor before next sprint.

Phase 7: Harness Improvement Plan

For each finding from Phases 2-5 with a score < 4:

HARNESS IMPROVEMENT [N]: [TITLE]
Dimension:   [context | tools | evaluation | feedback | docs]
Finding:     [specific issue identified]
Impact:      [how this degrades agent performance]
Fix:         [concrete change to harness — specific file, section, or process]
Priority:    [P0 — blocks agent / P1 — degrades quality / P2 — nice to have]

Generate a prioritized backlog. P0 items must be fixed before the next sprint.

Phase 8: Save Harness Audit Report

HARNESS_DIR="docs/superomni/harness-audits"
mkdir -p "$HARNESS_DIR"
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-' || echo "main")
TIMESTAMP=$(date +%Y-%m-%d-%H%M%S)
REPORT_FILE="$HARNESS_DIR/harness-audit-${BRANCH}-${TIMESTAMP}.md"
echo "Saving harness audit to $REPORT_FILE"

Save the full audit report including all scores, findings, and improvement backlog.

Report

HARNESS AUDIT REPORT
════════════════════════════════════════
Branch:             [branch]
Date:               [date]
Skills / Agents:    [N] skills, [N] agents, [N] commands
Preamble size:      [N] lines ([OK / BLOATED])
Validation:         [PASS / FAIL]
Health score:       [N]/25 ([rating])
Top finding:        [single most important issue]
P0 improvements:    [N]
P1 improvements:    [N]
P2 improvements:    [N]
Report saved:       [docs/superomni/harness-audits/...]
Status: DONE | DONE_WITH_CONCERNS | BLOCKED
════════════════════════════════════════

Preamble (Core)

Status protocol — end every session with one of: DONE (evidence provided) · DONE_WITH_CONCERNS (list each) · BLOCKED (state what blocks you) · NEEDS_CONTEXT (state what you need).

Output directory — all artifacts go in docs/superomni/<kind>/<kind>-[branch]-[session]-[date].md. See CLAUDE.md for the full directory map.

Anti-sycophancy — take a position on every significant question. Name flaws directly. No filler ("that's interesting", "you might consider", "that could work").

Telemetry (local only) — at session end, log bin/analytics-log. Nothing leaves the machine.

See preamble-ref.md for detailed protocols.

Harness Engineering

Goal: Design and maintain the agent harness — the scaffolding of environment, context, tools, constraints, evaluation gates, and feedback loops that determine how well agents perform.

"Engineers design the system. Agents execute." — OpenAI Harness Engineering

Iron Law

THE HARNESS IS THE PRODUCT. CODE IS ITS OUTPUT.

Core Principles (From OpenAI + Anthropic)

Phase 1: Harness Inventory

Take stock of the current harness state:

# Skill count + structure
ls skills/ | wc -l
ls skills/

# Agent count
ls agents/

# Command count
ls commands/

# Preamble size (context overhead)
wc -l lib/preamble.md

# Skill template sizes (larger = more context pressure)
wc -l skills/*/SKILL.md.tmpl | sort -n | tail -10

# Validation status
npm test 2>/dev/null || bash lib/validate-skills.sh 2>/dev/null

# Recent harness changes
git log --oneline -10 -- lib/ skills/ agents/ commands/

# Any stale/out-of-date docs
find docs/ -name "*.md" -older /tmp 2>/dev/null | head -10

Document findings:

Total skills: ___
Total agents: ___
Total commands: ___
Preamble size (lines): ___
Validation status: PASS / FAIL
Largest skill (lines): ___

Phase 2: Context Window Audit

Context window pressure is one of the most common causes of agent degradation. Audit the harness context load:

2a. Preamble Efficiency Analysis

Review lib/preamble.md:

[ ] Every section earns its inclusion — remove dead protocols
[ ] No repetition with individual skill instructions
[ ] Status protocol is concise and actionable
[ ] Performance checkpoint is lightweight (3 questions max)
[ ] Telemetry block is at the end (lowest priority)

Target preamble size: < 150 lines. Flag if > 200 lines.

2b. Skill Bloat Detection

For each skill > 200 lines, ask:

Are all phases strictly necessary?
Are examples too verbose? (truncate to headers only)
Can any phase be replaced with a reference to another skill?
Is this skill actually used? (check telemetry)

2c. Progressive Disclosure Check

Does the framework expose only necessary context at each stage?

Good harnesses load context on demand, not all at once.

Phase 3: Tool Action Space Audit

Per Anthropic's principle: fewer, more expressive tools outperform large menus of narrow ones.

Review the agent's tool access:

# Check allowed-tools across all skills
grep "allowed-tools" skills/*/SKILL.md.tmpl

For each skill, evaluate:

[ ] Are the allowed tools the minimum needed for that skill?
[ ] Are any rarely-used tools creating confusion or unnecessary options?
[ ] Do composable skill combinations cover the same ground as single complex tools?

Recommended tool sets by role:

Flag any skill whose tool set exceeds its role's minimum.

Phase 4: Evaluation Gate Audit

"Evaluation is the load-bearing part of agent harness design." — OpenAI/Anthropic harness engineering principles

Map every major workflow transition and verify an evaluation gate exists:

Any gap = harness deficiency. Add missing gates.

Phase 5: Feedback Loop Audit

A healthy harness converts agent failures into harness improvements:

Agent fails → Signal captured → Harness updated → Agent retries → Improvement
     ↑                                                                   |
     └───────────────────────────────────────────────────────────────────┘

Check the current feedback paths:

5a. Error → Harness Signal

When an agent fails a task repeatedly (3+ attempts), is there a defined process to:

[ ] Record the failure pattern (systematic-debugging skill)?
[ ] Identify the harness gap causing the failure?
[ ] Update the relevant skill, doc, or constraint?

5b. Performance → Improvement

Does the self-improvement skill output get consumed?

ls docs/superomni/improvements/ 2>/dev/null | head -5

[ ] Improvement reports exist?
[ ] Action items from last report applied to current sprint?
[ ] workflow skill reads improvement reports at sprint start?

5c. Documentation Garbage Collection

Is there a regular cadence for cleaning up:

[ ] Stale skill instructions that no longer match agent behavior?
[ ] Outdated docs that contradict current implementation?
[ ] Dead commands that are registered but never invoked?
[ ] Agent definitions whose Iron Laws conflict with updated skills?

Recommended: Schedule a harness GC pass after every 5 sprints.

Phase 6: Harness Health Score

Score the harness on each dimension (1-5):

Total: __ / 25

Scoring guide:

21-25 — World-class harness. Focus on new capabilities.
16-20 — Good harness. Address identified gaps.
11-15 — Fair harness. Significant drift or missing gates.
< 11 — Harness needs major refactor before next sprint.

Phase 7: Harness Improvement Plan

For each finding from Phases 2-5 with a score < 4:

HARNESS IMPROVEMENT [N]: [TITLE]
Dimension:   [context | tools | evaluation | feedback | docs]
Finding:     [specific issue identified]
Impact:      [how this degrades agent performance]
Fix:         [concrete change to harness — specific file, section, or process]
Priority:    [P0 — blocks agent / P1 — degrades quality / P2 — nice to have]

Generate a prioritized backlog. P0 items must be fixed before the next sprint.

Phase 8: Save Harness Audit Report

HARNESS_DIR="docs/superomni/harness-audits"
mkdir -p "$HARNESS_DIR"
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-' || echo "main")
TIMESTAMP=$(date +%Y-%m-%d-%H%M%S)
REPORT_FILE="$HARNESS_DIR/harness-audit-${BRANCH}-${TIMESTAMP}.md"
echo "Saving harness audit to $REPORT_FILE"

Save the full audit report including all scores, findings, and improvement backlog.

Report

HARNESS AUDIT REPORT
════════════════════════════════════════
Branch:             [branch]
Date:               [date]
Skills / Agents:    [N] skills, [N] agents, [N] commands
Preamble size:      [N] lines ([OK / BLOATED])
Validation:         [PASS / FAIL]
Health score:       [N]/25 ([rating])
Top finding:        [single most important issue]
P0 improvements:    [N]
P1 improvements:    [N]
P2 improvements:    [N]
Report saved:       [docs/superomni/harness-audits/...]
Status: DONE | DONE_WITH_CONCERNS | BLOCKED
════════════════════════════════════════

Adoption

Wilder1222/harness-engineering

$ install --global

Security Scan Results

SKILL.md

Preamble (Core)

Harness Engineering

Iron Law

Core Principles (From OpenAI + Anthropic)

Phase 1: Harness Inventory

Phase 2: Context Window Audit

2a. Preamble Efficiency Analysis

2b. Skill Bloat Detection

2c. Progressive Disclosure Check

Phase 3: Tool Action Space Audit

Phase 4: Evaluation Gate Audit

Phase 5: Feedback Loop Audit

5a. Error → Harness Signal

5b. Performance → Improvement

5c. Documentation Garbage Collection

Phase 6: Harness Health Score

Phase 7: Harness Improvement Plan

Phase 8: Save Harness Audit Report

Report

Related Skills

Wilder1222/refactoring

Wilder1222/framework-management

Wilder1222/dependency-audit

Wilder1222/writing-skills

Wilder1222/harness-engineering

$ install --global

Security Scan Results

SKILL.md

Preamble (Core)

Harness Engineering

Iron Law

Core Principles (From OpenAI + Anthropic)

Phase 1: Harness Inventory

Phase 2: Context Window Audit

2a. Preamble Efficiency Analysis

2b. Skill Bloat Detection

2c. Progressive Disclosure Check

Phase 3: Tool Action Space Audit

Phase 4: Evaluation Gate Audit

Phase 5: Feedback Loop Audit

5a. Error → Harness Signal

5b. Performance → Improvement

5c. Documentation Garbage Collection

Phase 6: Harness Health Score

Phase 7: Harness Improvement Plan

Phase 8: Save Harness Audit Report

Report

Related Skills

Wilder1222/refactoring

Wilder1222/framework-management

Wilder1222/dependency-audit

Wilder1222/writing-skills