codex/skills/tune/SKILL.md
Diagnose and optimize Codex skills from orthogonal evidence. Trigger for `$tune`, in-flight or historical skill usage analysis, intended-vs-observed behavior, missed/false/partial activations, `$seq` session evidence, skill-gap classification, `$refine` briefs, or applying a validated edit. Stop at audit/proposal for analysis asks; apply, commit, and push only when explicitly asked.
npx skillsauth add tkersey/dotfiles tuneInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use tune to improve an existing Codex skill by comparing the skill's intended contract with observed evidence.
tune is an orchestration skill:
$tune owns diagnosis, evidence interpretation, gap classification, and $refine brief writing.$seq is the default historical-session evidence adapter, not the only possible evidence source.$refine owns in-place skill edits once the gap and success criteria are known.$ship and $land remain responsible for PR and merge workflows.$ms remains responsible for new skills and direct skill surgery when no usage-backed diagnosis is needed.Core question:
Given what this skill is intended to do and what the selected evidence sources show,
is it working as intended, and what should change?
Use $tune when the user asks to:
$seq and $refine together.$refine.Example prompts:
$tune on the pdf skill."Do not use $tune to:
$refine or $ms as appropriate.$seq.$ms.$refine.$ship.$land.Choose exactly one mode before mining evidence. Mode is independent of evidence source.
Use when the user asks whether a skill is working, how it is being used, or what the evidence shows.
Output:
Do not edit the skill.
Use when the user asks what should change, asks for deep analysis, asks for optimization guidance, or wants a refinement plan.
Output:
$refine brief,Do not edit the skill unless the user explicitly asks to apply the proposal.
$refineUse only when the user explicitly asks to apply, edit, patch, update, or change skill files now.
Output:
$refine brief,Resolve mode in this order:
apply, edit the files, patch now, make the change, or update the skill: apply-with-refine if the apply gate passes.audit, analyze, deep analysis, is it working, what should change, recommend, proposal, or make it optimal: audit-only or proposal-only, even when the prompt also says improve, fix, optimize, or upgrade.proposal-only.$refine or run only the $refine phase.proposal-only unless the user explicitly asks to apply file changes.Choose audit-only for status/evidence questions. Choose proposal-only for deep analysis or change planning. Choose apply-with-refine only after the apply gate passes.
Evidence is orthogonal to $tune. $tune consumes evidence from declared sources; it must not hard-code the data plane into the skill.
Supported source kinds:
in-flight: current conversation evidence, current user feedback, current tool/validation output, and the workflow that is unfolding now.history: prior Codex sessions, usually through $seq; scope may be recent, arbitrary, explicit session ids, workdir-bound, or repo-bound.provided: user-supplied briefs, logs, transcripts, validation output, diffs, or artifacts.worktree: current repo state, changed files, validation output, Git status, and metadata files.mixed: any combination of the above.Source descriptor:
Evidence source:
- Kind: in-flight | history | provided | worktree | mixed
- Locator: current conversation | sessions root | session id | workdir | repo | file/artifact | validation output
- Scope: current turn | current conversation | recent window | arbitrary history | explicit sessions | supplied evidence
- Window: <duration/date range/all/none>
- Access method: current context | $seq command | file read | tool output | user-provided text
- Privacy constraint: summarize only | raw excerpts allowed if safe | no raw transcript
- Limitation: <what this source cannot prove>
Do not make a historical $seq run mandatory for in-flight tuning. Do not let current-turn feedback justify broad historical claims. Use mixed sources when the user asks to connect current behavior to past recurrence.
this conversation, in flight, right now, or points to the current behavior, use in-flight first.recent, last N days, or gives no scope for general usage tuning, use history with a recent default window of 90 days.arbitrary, all history, go back as far as needed, or gives explicit session ids, use history with an explicit arbitrary scope. Do not silently collapse it to 90 days.provided evidence and record its limits.worktree evidence for validation and publishing checks.All conditions must hold before editing:
If any condition fails, stop at audit-only or proposal-only.
In apply-with-refine, publish each validated atomic change unless the user explicitly says not to commit or push.
Rules:
$refine brief. Do not stage unrelated worktree changes.Tune <skill>: <short gap/fix summary>.git push when an upstream exists; if no upstream exists, use git push -u origin HEAD only when origin is configured and the current branch is appropriate.$ship and $land own those workflows.Publishing sequence after validation passes:
git status --short
git add -- <scoped changed files>
git diff --cached --check
git commit -m "Tune <skill>: <short gap/fix summary>"
git rev-parse --abbrev-ref --symbolic-full-name @{u} >/dev/null 2>&1 && git push
# If no upstream exists and origin is configured, use: git push -u origin HEAD
Record the commit SHA and push result in the final report.
Identify these before mining evidence:
codex/skills.~/.codex/sessions.Default only when unspecified: general historical tuning uses history over the last 90 days. In-flight tuning uses current conversation evidence first.
Protected skills require extra care:
seqtunerefinecronshipland.system/*For protected skills:
When the target skill is tune, default to proposal-only unless the user explicitly asks to apply file changes. Preserve mode-selection, evidence-source orthogonality, evidence-strength, and companion-skill boundaries unless direct evidence justifies changing them.
$seq owns historical session, memory, artifact, workflow, and tool-trace mining.$tune owns source selection, intended-vs-observed diagnosis, evidence interpretation, gap classification, and $refine brief writing.$refine owns in-place edits once the gap and success criteria are known.$ms owns new skill creation and direct skill surgery when no usage-backed tuning diagnosis is needed.$ship and $land own PR and merge workflows.$tune may commit and push validated apply-with-refine changes, but it does not create PRs, merge branches, or clean up branches.If the user asks to discover whether a change is needed, $tune owns the turn. If the user provides a complete evidence-backed brief and asks only for edits, hand off to $refine. If the user asks for both diagnosis and edits, diagnose first, write the brief, then invoke $refine only if the apply gate passes.
Write:
Goal: Tune <skill> so that <intended behavior> better matches <observed or suspected usage pattern>.
Mode: audit-only | proposal-only | apply-with-refine
Evidence source: <source descriptor>
Apply gate: pass | blocked: <reason>
Read the target skill before judging usage:
<skill-root>/<skill>/SKILL.md
<skill-root>/<skill>/agents/openai.yaml
<skill-root>/<skill>/scripts/
<skill-root>/<skill>/references/
<skill-root>/<skill>/assets/
Only read resources relevant to the target skill or suspected gap.
Summarize primary purpose, triggers, anti-triggers, expected inputs/outputs, required workflow, resources, validation expectations, companion handoffs, and upgrade boundaries.
Use references/evidence-source-model.md to choose the data source. Declare what source can and cannot prove before drawing conclusions.
For in-flight, use current conversation evidence, explicit user feedback, current tool output, validation output, and visible workflow behavior. Summarize; do not include raw transcript unless explicitly allowed and safe.
For history, use $seq and references/seq-evidence-playbook.md. Respect the requested root, time window, session ids, workdir, or arbitrary scope.
For provided, inspect only the supplied evidence needed for the claim and record its provenance limits.
For mixed, keep separate ledger entries by source kind before synthesizing.
Use the least invasive source that can answer the question. Do not run broad historical mining when current-turn feedback is sufficient for a narrow edit. Do not infer recurrence from in-flight evidence alone.
Sanitize user-facing reports and briefs. Do not include raw transcript text, raw memory text, secrets, credentials, private personal details, sensitive local paths, private path fragments, or unnecessarily long command output.
For each important source, record:
- Source kind:
- Command, locator, or source:
- Why this source was chosen:
- What it proves:
- What it does not prove:
- Evidence class:
- Confidence: high | medium | low
- Scope/window:
- Recurrence:
- Counterevidence:
- Sanitization note:
Do not treat running a command as proof. Explain what the result actually establishes.
Strong evidence:
explicit_current_turn_feedbackin_flight_validation_failureprovided_artifact_evidencerepeated_session_evidenceclear_validation_failureclear_routing_failurerepeated_manual_workaroundstale_or_contradictory_metadatahistorical_skill_regressionWeak evidence:
thin_usage_signalsingle_ambiguous_sessionlow_activation_countpossible_trigger_overlapstyle_preference_onlymissing_examples_without_failureunbounded_history_without_sampling_planDefault thresholds:
repeated_session_evidence: same gap appears in at least 3 relevant historical sessions, or at least 2 independent sessions plus explicit user feedback.repeated_manual_workaround: substantially similar workaround appears in at least 3 sessions or across at least 2 target skills.low_activation_count: fewer than 3 relevant activations in the selected history scope.single_ambiguous_session: one session with unclear intent, incomplete trace, or no explicit outcome signal.clear_routing_failure: user asked for the skill's owned work and a different skill or no skill handled the core workflow.historical_skill_regression: behavior changed after a material skill-block, trigger, metadata, or workflow update.Explicit current-turn feedback can be strong evidence for a narrow local change. It must not justify broad claims about historical recurrence without historical evidence.
Classify the mismatch using references/gap-taxonomy.md:
Identify the smallest sufficient fix. Prefer no edit when evidence is thin, source scope is ambiguous, or the likely change crosses skill boundaries.
$refine BriefUse references/refinement-brief-template.md. Include source descriptors, the evidence ledger, counterevidence, risk, success criteria, must-not-change constraints, and validation plan.
$refine, If AllowedUse $refine only after the apply gate passes:
Use `$refine` on <skill> with this tuning brief. Make the smallest sufficient edit that closes the diagnosed <gap_type> gap. Preserve unrelated behavior. Validate with quick_validate. Update agents/openai.yaml only if metadata changed. Add scripts/references/assets only if the brief explicitly justifies them.
If the runtime cannot literally invoke another skill, perform the edit by following $refine rules and report that the $refine phase was executed manually.
For any applied edit, run:
uv run --with pyyaml -- python3 codex/skills/.system/skill-creator/scripts/quick_validate.py <skill-root>/<skill>
If shared assumptions or multiple skills changed, run:
codex/skills/tune/scripts/validate-changed-skills
If scripts changed, run at least one representative sample command.
If codex/skills/.system/skill-creator/scripts/quick_validate.py is unavailable, report validation as blocked and do not claim validation passed.
When mode is apply-with-refine and validation passes, commit and push the scoped files for that atomic change before starting another unrelated change.
Before committing, run git status --short, stage only brief-justified files, run git diff --cached --check, commit, and push. If any publishing step is blocked, report the blocked step and leave the repository in the safest available state.
Use this result shape:
Tuned: <skill>
Mode:
- <audit-only | proposal-only | apply-with-refine>
Evidence sources:
- <source descriptor summary>
Apply gate:
- <pass | blocked: reason>
Evidence:
- <evidence_class>: <sanitized evidence summary>
Diagnosis:
- Intended: <summary>
- Observed: <summary>
- Gap: <gap type and explanation>
Refinement:
- <files changed, proposed files, or no edit>
- <why the changes close the gap>
Validation:
- <command>: <pass/fail/blocked/not run and why>
Publishing:
- Commit: <sha | blocked/not run and why>
- Push: <remote/branch result | blocked/not run and why>
- Scoped files: <files staged and committed>
Remaining uncertainty:
- <anything not proven by the evidence>
tune may recommend or drive upgrades when the current skill cannot reliably satisfy its intended contract under selected evidence.
Allowed upgrade types: trigger, workflow, capability within existing purpose, tooling, resource, metadata, validation, and handoff/boundary upgrades.
Do not recommend a capability upgrade that changes the skill's core purpose. That should be a new skill or an explicit user-directed redesign.
A good $tune run:
$seq for historical evidence, with explicit scope for arbitrary history.$refine a precise brief.apply-with-refine change.A bad $tune run:
~/.codex/sessions when the user asked about in-flight behavior.tools
Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.
development
Orchestrate Codex skill optimization during active sessions through $cas goal control, $shadow single-session evidence, $tune diagnosis/refinement briefs, and the skill-optimizer custom subagent. Trigger for $opt, skill optimization loops, session-driven skill tuning, meta-skill audits, or explicit validated skill edits. Do not use for general code optimization, product optimization, or performance tuning.
development
Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.
development
Explicitly shadow, tail, watch, follow, monitor, supervise, or companion exactly one Codex session id/path through `$seq`, then apply a named target skill as an interpretation/reporting/proposal/action lens until the watched session stops.