Preamble (Core)

Status protocol — end every session with one of: DONE (evidence provided) · DONE_WITH_CONCERNS (list each) · BLOCKED (state what blocks you) · NEEDS_CONTEXT (state what you need).

Auto-advance — pipeline: THINK → PLAN → REVIEW → BUILD → VERIFY → RELEASE. Only human gate is spec approval at THINK. On DONE at other stages, print [STAGE] DONE -> advancing to [NEXT-STAGE] and invoke the next skill. On any non-DONE status at any stage, STOP.

Output directory — all artifacts go in docs/superomni/<kind>/<kind>-[branch]-[session]-[date].md. See CLAUDE.md for the full directory map.

TACIT-DENSE — before high-tacit decisions, classify D1 (domain expertise) · D2 (user-facing UX) · D3 (team culture) · D4 (novel pattern). On hit, output TACIT-DENSE [D#]: [question] — My default: [recommendation]. See reference for actions.

Anti-sycophancy — take a position on every significant question. Name flaws directly. No filler ("that's interesting", "you might consider", "that could work").

Telemetry (local only) — at session end, log bin/analytics-log. Nothing leaves the machine.

See preamble-ref.md for detailed protocols.

Self-Improvement — First-Principles Performance Review

Goal: Close the feedback loop on every sprint by systematically evaluating process adherence, agent behavior, and skill effectiveness — then produce concrete improvement actions for the next session.

Consolidated Modes

self-improvement is the canonical reflection skill with three scopes:

process (default): workflow/skill/agent execution quality.
retro: delivery-focused retrospective based on commits and output artifacts.
harness: harness and gate effectiveness quality.

Retro Scope (Merged from retro)

When running with retro scope, include these delivery metrics before Phase 1 analysis:

# Default retrospective window
SINCE="7 days ago"

AUTHOR_EMAIL=$(git config user.email)
git log --oneline --since="${SINCE}" --author="${AUTHOR_EMAIL}" 2>/dev/null | head -100
git log --since="${SINCE}" --author="${AUTHOR_EMAIL}" --pretty=tformat: --numstat 2>/dev/null | head -200

Generate an additional artifact:

Deprecated in v0.5.8. Retrospective data is now written inside the release artifact (docs/superomni/releases/release-*.md, ## Retrospective section) by the release skill. When running in retro scope, write retro content into the release artifact if it exists, or into docs/superomni/improvements/improvement-[branch]-[session]-[date].md as a section. Do NOT create standalone docs/superomni/retros/ files.

Retro output must include:

Commit count and active-day cadence.
Net LOC and major files touched.
Ship-of-period highlight and a delivery risk note.

Iron Law

A FRAMEWORK THAT CANNOT MEASURE ITS OWN PERFORMANCE CANNOT IMPROVE.

Every sprint cycle must end with a self-evaluation. A session without reflection is a missed learning opportunity.

First-Principles Foundation

Performance problems in AI-assisted development reduce to three root causes:

Process drift — the right process was available but not followed
Evidence gaps — claims were made without verification
Scope creep — work expanded beyond what was planned

Every metric in this skill traces back to one of these three root causes.

Phase 0: Tacit Gap Mining

Before evaluating the current session, mine execution history for tacit knowledge gaps.

Signal Sources

# 1. Recurring review comments (3+ occurrences = uncodified standard)
echo "=== Review comment patterns ==="
for review in docs/superomni/reviews/review-*.md; do
  [ -f "$review" ] && grep -h "^- " "$review" 2>/dev/null
done | sort | uniq -c | sort -rn | head -10

# 2. Execution deviation records (manual overrides = unmatched preferences)
echo "=== Execution deviations ==="
for exec in docs/superomni/executions/execution-*.md; do
  [ -f "$exec" ] && grep -h -A1 "CONCERN\|DEVIATION\|override\|manual" "$exec" 2>/dev/null
done | head -10

# 3. Skill override frequency
echo "=== Skill overrides ==="
grep "override\|rejected\|skipped" ~/.omni-skills/analytics/usage.jsonl 2>/dev/null | tail -5

Mining Questions

Answer each with evidence from the sources above:

[ ] In the last 5 executions, which Agent suggestions were rejected by the user?
[ ] In code reviews, which comment types appeared 3+ times?
[ ] In which scenarios did the user manually modify Agent output?

Analysis Logic

User rejects Agent suggestion = Agent lacks a tacit preference at this point
Recurring review comment = Standard not yet captured by an Iron Law
Manual output modification = Style mismatch between Agent and user

Tacit Gap Output

If any gaps are found, generate docs/superomni/improvements/tacit-gaps-[date].md. Use the template in reference/phase-templates.md § Phase 0.

If no gaps found, note: "No tacit gaps detected in available history — continue to Phase 1."

Phase 1: Gather Session Evidence

Collect objective data about what happened in this session:

# What was built/changed
git log --oneline -10
git diff --stat HEAD~3 2>/dev/null | tail -5

# What artifacts were produced
ls docs/superomni/specs/spec-*.md docs/superomni/plans/plan-*.md 2>/dev/null
ls docs/superomni/ .superomni/ 2>/dev/null

# Read the latest evaluation report (from verification skill)
LATEST_EVAL=$(find docs/superomni/evaluations -name "*.md" -type f 2>/dev/null | sort | tail -1)
if [ -n "$LATEST_EVAL" ]; then
  echo "Latest verification evaluation:"
  cat "$LATEST_EVAL" | head -40
fi

# Skill telemetry for this session
tail -10 ~/.omni-skills/analytics/usage.jsonl 2>/dev/null || echo "(no telemetry)"

# Current test status
npm test 2>/dev/null || bash lib/validate-skills.sh 2>/dev/null || echo "(no test suite found)"

Document the raw facts:

Which skills were invoked?
What artifacts were produced?
What tests ran, and what was the outcome?
What did the latest evaluation report say?

Phase 2: Process Adherence Evaluation

Answer each question with YES / PARTIAL / NO + reason:

Workflow Adherence

| Question | Answer | Evidence | |----------|--------|----------| | Did each major task follow the THINK→PLAN→REVIEW→BUILD→VERIFY→RELEASE cycle? | | | | Was a spec or plan artifact created before implementation? | | | | Were skills invoked for their intended triggers (not bypassed)? | | | | Did the session end with a status report (DONE/BLOCKED/etc.)? | | |

Iron Law Compliance

| Law | Followed? | Notes | |-----|-----------|-------| | No fixes without root cause investigation | | | | One change at a time during debugging | | | | 3-strike escalation rule respected | | | | Blast radius flagged when >5 files touched | | | | Tests written before claiming done | | |

Evidence Quality

Was every "it works" claim backed by test output or command results? YES / NO
Were all PR review comments addressed with commit hashes? YES / NO
Was the final status report (DONE/BLOCKED/etc.) accurate? YES / NO

Phase 3: Agent Behavior Evaluation

Evaluate the AI agent's performance on three 1-5 dimensions: Scope Management, Instruction Following, Escalation Behavior. Use the scoring rubrics in reference/phase-templates.md § Phase 3 and record Score __/5 — Evidence: ___ for each. Total: __ / 15.

Phase 4: Skill Effectiveness Evaluation

For each skill invoked in this session, rate its effectiveness:

| Skill | Was it the right skill? | Phases completed? | Output quality | Score (1-5) | |-------|------------------------|-------------------|---------------|-------------| | [skill-1] | YES/NO | 100% / 80% / <50% | clear/partial/missing | | | [skill-2] | YES/NO | 100% / 80% / <50% | clear/partial/missing | |

Questions to answer for each skill:

Was this the right skill for the situation, or should a different one have been used?
Were all defined phases completed, or were some skipped?
Was the output complete: report block, status, "What's next" line?
Did the skill produce value, or was it ceremonial?

Phase 5: First-Principles Gap Analysis

Trace every deviation found back to a root cause category:

| Deviation observed | Root cause | Principle violated | |--------------------|-----------|-------------------| | [example: skipped plan review] | Process drift — time pressure | "Plan Lean" — even lean plans need review | | [example: claimed done without tests] | Evidence gap | "Evidence over Claims" |

The 6 Decision Principles check:

[ ] Choose completeness — were edge cases covered?
[ ] Boil lakes — were related issues in blast radius fixed?
[ ] Pragmatic — were choices clean and minimal?
[ ] DRY — was any code/logic duplicated?
[ ] Explicit over clever — was anything unnecessarily abstract?
[ ] Bias toward action — did concerns block progress unnecessarily?

Phase 6: Improvement Actions

Generate exactly 3 concrete improvement actions for the next sprint, using the format and worked example in reference/phase-templates.md § Phase 6.

Phase 6.5: Loop Back to Next Plan

Before closing, reference any unresolved P0/P1 action items from the previous improvement report, so they are not forgotten:

# Find the most recent prior improvement report (not the one just written)
PREV_IMPROVE=$(find docs/superomni/improvements -name "improvement-*.md" -type f 2>/dev/null | sort | tail -2 | head -1)
if [ -n "$PREV_IMPROVE" ]; then
  echo "=== Prior improvement actions ==="
  grep -A4 "^### ACTION" "$PREV_IMPROVE" 2>/dev/null | head -30
fi

For each prior P0/P1 action:

If applied this session: note ✓ resolved in the Gap Analysis (Phase 5).
If still open: carry it forward as ACTION [N] in the current report (do not silently drop it).

This creates an explicit improvement loop: prior actions appear in the next plan until resolved.

Phase 7: Save Improvement Report

IMPROVE_DIR="docs/superomni/improvements"
mkdir -p "$IMPROVE_DIR"
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-' || echo "main")
TIMESTAMP=$(date +%Y-%m-%d-%H%M%S)
REPORT_FILE="$IMPROVE_DIR/improvement-${BRANCH}-${TIMESTAMP}.md"

Save the full evaluation report to $REPORT_FILE. All scores and tables from Phases 1–6 must be included — not just the action items. Use the canonical structure in reference/phase-templates.md § Phase 7.

echo "Improvement report saved to $REPORT_FILE"

This report is the canonical record of agent and skill performance for this session. The workflow skill reads it at the next sprint start to apply the action items.

Report

End the session with the SELF-IMPROVEMENT REPORT block defined in reference/phase-templates.md § Final SELF-IMPROVEMENT REPORT Block.

Preamble (Core)

Status protocol — end every session with one of: DONE (evidence provided) · DONE_WITH_CONCERNS (list each) · BLOCKED (state what blocks you) · NEEDS_CONTEXT (state what you need).

Output directory — all artifacts go in docs/superomni/<kind>/<kind>-[branch]-[session]-[date].md. See CLAUDE.md for the full directory map.

Anti-sycophancy — take a position on every significant question. Name flaws directly. No filler ("that's interesting", "you might consider", "that could work").

Telemetry (local only) — at session end, log bin/analytics-log. Nothing leaves the machine.

See preamble-ref.md for detailed protocols.

Self-Improvement — First-Principles Performance Review

Consolidated Modes

self-improvement is the canonical reflection skill with three scopes:

process (default): workflow/skill/agent execution quality.
retro: delivery-focused retrospective based on commits and output artifacts.
harness: harness and gate effectiveness quality.

Retro Scope (Merged from retro)

When running with retro scope, include these delivery metrics before Phase 1 analysis:

# Default retrospective window
SINCE="7 days ago"

AUTHOR_EMAIL=$(git config user.email)
git log --oneline --since="${SINCE}" --author="${AUTHOR_EMAIL}" 2>/dev/null | head -100
git log --since="${SINCE}" --author="${AUTHOR_EMAIL}" --pretty=tformat: --numstat 2>/dev/null | head -200

Generate an additional artifact:

Deprecated in v0.5.8. Retrospective data is now written inside the release artifact (docs/superomni/releases/release-*.md, ## Retrospective section) by the release skill. When running in retro scope, write retro content into the release artifact if it exists, or into docs/superomni/improvements/improvement-[branch]-[session]-[date].md as a section. Do NOT create standalone docs/superomni/retros/ files.

Retro output must include:

Commit count and active-day cadence.
Net LOC and major files touched.
Ship-of-period highlight and a delivery risk note.

Iron Law

A FRAMEWORK THAT CANNOT MEASURE ITS OWN PERFORMANCE CANNOT IMPROVE.

Every sprint cycle must end with a self-evaluation. A session without reflection is a missed learning opportunity.

First-Principles Foundation

Performance problems in AI-assisted development reduce to three root causes:

Process drift — the right process was available but not followed
Evidence gaps — claims were made without verification
Scope creep — work expanded beyond what was planned

Every metric in this skill traces back to one of these three root causes.

Phase 0: Tacit Gap Mining

Before evaluating the current session, mine execution history for tacit knowledge gaps.

Signal Sources

# 1. Recurring review comments (3+ occurrences = uncodified standard)
echo "=== Review comment patterns ==="
for review in docs/superomni/reviews/review-*.md; do
  [ -f "$review" ] && grep -h "^- " "$review" 2>/dev/null
done | sort | uniq -c | sort -rn | head -10

# 2. Execution deviation records (manual overrides = unmatched preferences)
echo "=== Execution deviations ==="
for exec in docs/superomni/executions/execution-*.md; do
  [ -f "$exec" ] && grep -h -A1 "CONCERN\|DEVIATION\|override\|manual" "$exec" 2>/dev/null
done | head -10

# 3. Skill override frequency
echo "=== Skill overrides ==="
grep "override\|rejected\|skipped" ~/.omni-skills/analytics/usage.jsonl 2>/dev/null | tail -5

Mining Questions

Answer each with evidence from the sources above:

[ ] In the last 5 executions, which Agent suggestions were rejected by the user?
[ ] In code reviews, which comment types appeared 3+ times?
[ ] In which scenarios did the user manually modify Agent output?

Analysis Logic

User rejects Agent suggestion = Agent lacks a tacit preference at this point
Recurring review comment = Standard not yet captured by an Iron Law
Manual output modification = Style mismatch between Agent and user

Tacit Gap Output

If any gaps are found, generate docs/superomni/improvements/tacit-gaps-[date].md. Use the template in reference/phase-templates.md § Phase 0.

If no gaps found, note: "No tacit gaps detected in available history — continue to Phase 1."

Phase 1: Gather Session Evidence

Collect objective data about what happened in this session:

# What was built/changed
git log --oneline -10
git diff --stat HEAD~3 2>/dev/null | tail -5

# What artifacts were produced
ls docs/superomni/specs/spec-*.md docs/superomni/plans/plan-*.md 2>/dev/null
ls docs/superomni/ .superomni/ 2>/dev/null

# Read the latest evaluation report (from verification skill)
LATEST_EVAL=$(find docs/superomni/evaluations -name "*.md" -type f 2>/dev/null | sort | tail -1)
if [ -n "$LATEST_EVAL" ]; then
  echo "Latest verification evaluation:"
  cat "$LATEST_EVAL" | head -40
fi

# Skill telemetry for this session
tail -10 ~/.omni-skills/analytics/usage.jsonl 2>/dev/null || echo "(no telemetry)"

# Current test status
npm test 2>/dev/null || bash lib/validate-skills.sh 2>/dev/null || echo "(no test suite found)"

Document the raw facts:

Which skills were invoked?
What artifacts were produced?
What tests ran, and what was the outcome?
What did the latest evaluation report say?

Phase 2: Process Adherence Evaluation

Answer each question with YES / PARTIAL / NO + reason:

Workflow Adherence

Iron Law Compliance

Evidence Quality

Was every "it works" claim backed by test output or command results? YES / NO
Were all PR review comments addressed with commit hashes? YES / NO
Was the final status report (DONE/BLOCKED/etc.) accurate? YES / NO

Phase 3: Agent Behavior Evaluation

Phase 4: Skill Effectiveness Evaluation

For each skill invoked in this session, rate its effectiveness:

Questions to answer for each skill:

Was this the right skill for the situation, or should a different one have been used?
Were all defined phases completed, or were some skipped?
Was the output complete: report block, status, "What's next" line?
Did the skill produce value, or was it ceremonial?

Phase 5: First-Principles Gap Analysis

Trace every deviation found back to a root cause category:

The 6 Decision Principles check:

[ ] Choose completeness — were edge cases covered?
[ ] Boil lakes — were related issues in blast radius fixed?
[ ] Pragmatic — were choices clean and minimal?
[ ] DRY — was any code/logic duplicated?
[ ] Explicit over clever — was anything unnecessarily abstract?
[ ] Bias toward action — did concerns block progress unnecessarily?

Phase 6: Improvement Actions

Generate exactly 3 concrete improvement actions for the next sprint, using the format and worked example in reference/phase-templates.md § Phase 6.

Phase 6.5: Loop Back to Next Plan

Before closing, reference any unresolved P0/P1 action items from the previous improvement report, so they are not forgotten:

# Find the most recent prior improvement report (not the one just written)
PREV_IMPROVE=$(find docs/superomni/improvements -name "improvement-*.md" -type f 2>/dev/null | sort | tail -2 | head -1)
if [ -n "$PREV_IMPROVE" ]; then
  echo "=== Prior improvement actions ==="
  grep -A4 "^### ACTION" "$PREV_IMPROVE" 2>/dev/null | head -30
fi

For each prior P0/P1 action:

If applied this session: note ✓ resolved in the Gap Analysis (Phase 5).
If still open: carry it forward as ACTION [N] in the current report (do not silently drop it).

This creates an explicit improvement loop: prior actions appear in the next plan until resolved.

Phase 7: Save Improvement Report

IMPROVE_DIR="docs/superomni/improvements"
mkdir -p "$IMPROVE_DIR"
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-' || echo "main")
TIMESTAMP=$(date +%Y-%m-%d-%H%M%S)
REPORT_FILE="$IMPROVE_DIR/improvement-${BRANCH}-${TIMESTAMP}.md"

echo "Improvement report saved to $REPORT_FILE"

This report is the canonical record of agent and skill performance for this session. The workflow skill reads it at the next sprint start to apply the action items.

Report

End the session with the SELF-IMPROVEMENT REPORT block defined in reference/phase-templates.md § Final SELF-IMPROVEMENT REPORT Block.

Adoption

Wilder1222/self-improvement

$ install --global

Security Scan Results

SKILL.md

Preamble (Core)

Self-Improvement — First-Principles Performance Review

Consolidated Modes

Retro Scope (Merged from retro)

Iron Law

First-Principles Foundation

Phase 0: Tacit Gap Mining

Signal Sources

Mining Questions

Analysis Logic

Tacit Gap Output

Phase 1: Gather Session Evidence

Phase 2: Process Adherence Evaluation

Workflow Adherence

Iron Law Compliance

Evidence Quality

Phase 3: Agent Behavior Evaluation

Phase 4: Skill Effectiveness Evaluation

Phase 5: First-Principles Gap Analysis

Phase 6: Improvement Actions

Phase 6.5: Loop Back to Next Plan

Phase 7: Save Improvement Report

Report

Related Skills

Wilder1222/refactoring

Wilder1222/framework-management

Wilder1222/dependency-audit

Wilder1222/writing-skills

Wilder1222/self-improvement

$ install --global

Security Scan Results

SKILL.md

Preamble (Core)

Self-Improvement — First-Principles Performance Review

Consolidated Modes

Retro Scope (Merged from retro)

Iron Law

First-Principles Foundation

Phase 0: Tacit Gap Mining

Signal Sources

Mining Questions

Analysis Logic

Tacit Gap Output

Phase 1: Gather Session Evidence

Phase 2: Process Adherence Evaluation

Workflow Adherence

Iron Law Compliance

Evidence Quality

Phase 3: Agent Behavior Evaluation

Phase 4: Skill Effectiveness Evaluation

Phase 5: First-Principles Gap Analysis

Phase 6: Improvement Actions

Phase 6.5: Loop Back to Next Plan

Phase 7: Save Improvement Report

Report

Related Skills

Wilder1222/refactoring

Wilder1222/framework-management

Wilder1222/dependency-audit

Wilder1222/writing-skills