Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

tkersey/saddle-up

Name: saddle-up
Author: tkersey

codex/skills/saddle-up/SKILL.md

npx skillsauth add tkersey/dotfiles saddle-up

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Saddle Up

Overview

Run an explicit-trigger continuous loop that updates a target harness, evaluates it against a fixed suite, and promotes passing changes through a dedicated branch + PR flow. Recent session-mining updates bias the loop toward Gemini 2.5 Pro failure modes: exact-output envelopes, proof honesty, failure-recovery wording, workdir discipline, and immediate external-blocker reporting.

Quick Start

Use an explicit model on every run, and prefer a clean target repo or a docs-only worktree.

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py run \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro

For a bounded debugging pass that cannot hang forever:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py run \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro \
  --no-commit \
  --max-cycles 1 \
  --opencode-timeout-seconds 180

If the improver path is the problem and you want to evaluate the current harness as-is:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py run \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro \
  --skip-improve \
  --no-commit \
  --max-cycles 1 \
  --opencode-timeout-seconds 180

If you want a fast proof of the curated harness only, without replay cases:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py run \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro \
  --skip-improve \
  --case-source curated \
  --case-parallelism 4 \
  --no-commit \
  --max-cycles 1 \
  --opencode-timeout-seconds 600

Refresh a Gemini-tuned suite before the next run:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py replay-refresh \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro \
  --refresh-curated

Stop gracefully from another shell:

touch /path/to/target-repo/.saddle-up/STOP

Inspect state:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py status \
  --repo /path/to/target-repo

Refresh replay cases from OpenCode prompt history (seq opencode-prompts):

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py replay-refresh \
  --repo /path/to/target-repo \
  --model google/gemini-2.5-pro

Workflow

Validate preflight: git repo, harness path, explicit model, opencode availability, and no pre-existing non-doc changes that would poison the docs-only gate.
Bootstrap .saddle-up/ files if missing, using model-aware defaults when the target is Gemini 2.5 Pro.
Start explicit-trigger continuous improve+eval cycles.
Use Gemini-oriented curated probes for exact-output blocks, local-evidence-first behavior, not run honesty, retry-path wording, workdir discipline, anti-drift, and external hard stops.
On Gemini 2.5 Pro runs, fail closed on improver-written AGENTS.md churn: rerun all curated cases immediately after the improver, keep the diff only if that curated gate stays green, and otherwise revert the protected retry/workdir/external-blocker/not run rule drift before mixed eval can count it.
Filter replay prompts toward harness-like OpenCode history instead of short/noisy chat fragments.
Enforce pass gate (>=80% by default) and docs-scope write policy.
Auto-commit passing changes to saddle-up/eval and open/update PR.
Auto-revert harness on regression below gate using the last passing commit, but do not revert for external provider/quota/auth/network blockers.
Stop automatically when reliability reaches 3 consecutive passing cycles, an external blocker is detected, or a manual stop/cycle cap is reached.

Defaults and Gates

threshold: 0.80
stability_window: 3 consecutive passes
model-aware opencode_timeout_seconds default:
- 600 for google/gemini-2.5-pro
- 180 for other profiles
Gemini 2.5 Pro bootstrap mix: 80% curated / 20% replay
generic bootstrap mix: 60% curated / 40% replay
stop file: .saddle-up/STOP (override with --stop-file)
max_cycles: unbounded unless set
branch: saddle-up/eval
replay-refresh --refresh-curated reseeds the curated suite from the current model profile

Repo Contract

run and status read/write these files under the target repo:

.saddle-up/suite.yaml
.saddle-up/scoring.yaml
.saddle-up/state.yaml
.saddle-up/runs.jsonl

Schema details:

references/eval_suite_schema.md
references/opencode_runner_contract.md

Guardrails

Keep mutation scope to harness/docs plus .saddle-up/* state files.
Fail fast when the repo already contains non-doc changes before the loop starts.
Fail run-level success when non-doc file edits appear.
For Gemini 2.5 Pro, improver-generated AGENTS.md edits must clear a dedicated curated gate before mixed eval can count them.
Do not auto-merge PRs.
Require explicit model selection per invocation.
Stop and surface external quota/auth/provider/network blockers instead of treating them as harness regressions.
No scheduler/cron path in this version; start runs explicitly with run.

Troubleshooting

If yaml import fails, run with uv run --with pyyaml ....
If a run appears stuck inside opencode run, first verify whether the model-aware default is simply too low for the current model; for Gemini 2.5 Pro the default is 600 seconds, and you can still override it explicitly with --opencode-timeout-seconds.
If the improver child is the part that hangs and you already trust the current harness edits, rerun with --skip-improve to evaluate the current harness without another rewrite attempt.
If Gemini improver edits keep being reverted by the post-improver curated gate, treat that as improver churn, not as proof that AGENTS.md needs more literal rewrites; inspect the reverted rule IDs in status/runs.jsonl first.
If replay is the slow or noisy part, rerun with --case-source curated to prove the curated harness independently before spending more cycles on replay prompts.
If curated probes are still too slow one-by-one, raise --case-parallelism so the exact-output checks can run concurrently on the same model.
If openrouter/google/gemini-2.5-pro hits credit or max_tokens failures, switch to direct google/gemini-2.5-pro before spending more harness cycles.
For one-cycle diagnosis without commits or PR side effects, use --no-commit --max-cycles 1.
If the loop stops with external_blocker, clear the provider/auth/network issue first; do not keep cycling a blocked harness.
If replay prompts feel noisy, rerun replay-refresh --model google/gemini-2.5-pro --refresh-curated to restore the Gemini-focused suite.
If you need to stop a running loop gracefully, create the stop file path (default .saddle-up/STOP) or interrupt with Ctrl+C.
If gh auth fails, run gh auth login before enabling PR automation.

tkersey/saddle-up

codex/skills/saddle-up/SKILL.md

Continuously evaluate and improve AGENTS.md-style harness instructions through explicit-trigger OpenCode loops with an explicit model. Use when you want recurring harness reliability runs, especially for Gemini 2.5 Pro/OpenCode harness tuning, clean-repo eval cycles, curated exact-output probes, automatic eval-branch commits and PR updates for passing harness/doc changes, and external-blocker detection or regression auto-revert without scheduler/cron automation.

49 stars

tools

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add tkersey/dotfiles saddle-up

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 2:48 AM10.7s6 files scanned

SKILL.md

name:: saddle-up
description:: Continuously evaluate and improve AGENTS.md-style harness instructions through explicit-trigger OpenCode loops with an explicit model. Use when you want recurring harness reliability runs, especially for Gemini 2.5 Pro/OpenCode harness tuning, clean-repo eval cycles, curated exact-output probes, automatic eval-branch commits and PR updates for passing harness/doc changes, and external-blocker detection or regression auto-revert without scheduler/cron automation.

Saddle Up

Overview

Quick Start

Use an explicit model on every run, and prefer a clean target repo or a docs-only worktree.

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py run \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro

For a bounded debugging pass that cannot hang forever:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py run \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro \
  --no-commit \
  --max-cycles 1 \
  --opencode-timeout-seconds 180

If the improver path is the problem and you want to evaluate the current harness as-is:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py run \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro \
  --skip-improve \
  --no-commit \
  --max-cycles 1 \
  --opencode-timeout-seconds 180

If you want a fast proof of the curated harness only, without replay cases:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py run \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro \
  --skip-improve \
  --case-source curated \
  --case-parallelism 4 \
  --no-commit \
  --max-cycles 1 \
  --opencode-timeout-seconds 600

Refresh a Gemini-tuned suite before the next run:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py replay-refresh \
  --repo /path/to/target-repo \
  --harness-path AGENTS.md \
  --model google/gemini-2.5-pro \
  --refresh-curated

Stop gracefully from another shell:

touch /path/to/target-repo/.saddle-up/STOP

Inspect state:

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py status \
  --repo /path/to/target-repo

Refresh replay cases from OpenCode prompt history (seq opencode-prompts):

uv run --with pyyaml codex/skills/saddle-up/scripts/saddle_up.py replay-refresh \
  --repo /path/to/target-repo \
  --model google/gemini-2.5-pro

Workflow

Validate preflight: git repo, harness path, explicit model, opencode availability, and no pre-existing non-doc changes that would poison the docs-only gate.
Bootstrap .saddle-up/ files if missing, using model-aware defaults when the target is Gemini 2.5 Pro.
Start explicit-trigger continuous improve+eval cycles.
Use Gemini-oriented curated probes for exact-output blocks, local-evidence-first behavior, not run honesty, retry-path wording, workdir discipline, anti-drift, and external hard stops.
On Gemini 2.5 Pro runs, fail closed on improver-written AGENTS.md churn: rerun all curated cases immediately after the improver, keep the diff only if that curated gate stays green, and otherwise revert the protected retry/workdir/external-blocker/not run rule drift before mixed eval can count it.
Filter replay prompts toward harness-like OpenCode history instead of short/noisy chat fragments.
Enforce pass gate (>=80% by default) and docs-scope write policy.
Auto-commit passing changes to saddle-up/eval and open/update PR.
Auto-revert harness on regression below gate using the last passing commit, but do not revert for external provider/quota/auth/network blockers.
Stop automatically when reliability reaches 3 consecutive passing cycles, an external blocker is detected, or a manual stop/cycle cap is reached.

Defaults and Gates

threshold: 0.80
stability_window: 3 consecutive passes
model-aware opencode_timeout_seconds default:
- 600 for google/gemini-2.5-pro
- 180 for other profiles
Gemini 2.5 Pro bootstrap mix: 80% curated / 20% replay
generic bootstrap mix: 60% curated / 40% replay
stop file: .saddle-up/STOP (override with --stop-file)
max_cycles: unbounded unless set
branch: saddle-up/eval
replay-refresh --refresh-curated reseeds the curated suite from the current model profile

Repo Contract

run and status read/write these files under the target repo:

.saddle-up/suite.yaml
.saddle-up/scoring.yaml
.saddle-up/state.yaml
.saddle-up/runs.jsonl

Schema details:

references/eval_suite_schema.md
references/opencode_runner_contract.md

Guardrails

Keep mutation scope to harness/docs plus .saddle-up/* state files.
Fail fast when the repo already contains non-doc changes before the loop starts.
Fail run-level success when non-doc file edits appear.
For Gemini 2.5 Pro, improver-generated AGENTS.md edits must clear a dedicated curated gate before mixed eval can count them.
Do not auto-merge PRs.
Require explicit model selection per invocation.
Stop and surface external quota/auth/provider/network blockers instead of treating them as harness regressions.
No scheduler/cron path in this version; start runs explicitly with run.

Troubleshooting

If yaml import fails, run with uv run --with pyyaml ....
If a run appears stuck inside opencode run, first verify whether the model-aware default is simply too low for the current model; for Gemini 2.5 Pro the default is 600 seconds, and you can still override it explicitly with --opencode-timeout-seconds.
If the improver child is the part that hangs and you already trust the current harness edits, rerun with --skip-improve to evaluate the current harness without another rewrite attempt.
If Gemini improver edits keep being reverted by the post-improver curated gate, treat that as improver churn, not as proof that AGENTS.md needs more literal rewrites; inspect the reverted rule IDs in status/runs.jsonl first.
If replay is the slow or noisy part, rerun with --case-source curated to prove the curated harness independently before spending more cycles on replay prompts.
If curated probes are still too slow one-by-one, raise --case-parallelism so the exact-output checks can run concurrently on the same model.
If openrouter/google/gemini-2.5-pro hits credit or max_tokens failures, switch to direct google/gemini-2.5-pro before spending more harness cycles.
For one-cycle diagnosis without commits or PR side effects, use --no-commit --max-cycles 1.
If the loop stops with external_blocker, clear the provider/auth/network issue first; do not keep cycling a blocked harness.
If replay prompts feel noisy, rerun replay-refresh --model google/gemini-2.5-pro --refresh-curated to restore the Gemini-focused suite.
If you need to stop a running loop gracefully, create the stop file path (default .saddle-up/STOP) or interrupt with Ctrl+C.
If gh auth fails, run gh auth login before enabling PR automation.

Related Skills

tkersey/fm

tools

VerifiedTrustedCommunity

Invokes Apple's macOS 27 fm command-line tool from a local Mac to use the on-device system model or Private Cloud Compute, including instructions, image prompts, schema-constrained JSON, and noninteractive automation. Use when the user asks to run Apple Foundation Models through fm, compare system versus pcc, generate structured output, or automate fm without Swift or an app.

64SKILL.mdUpdated Jul 20, 2026

tkersey/hylo

development

VerifiedTrustedCommunity

Compile historical Codex sessions into governed counterfactual evidence, evaluate an existing owner-applied candidate through blinded paired HCTP trials, and fold observable evidence into RUN, OBSERVE, or STOP. Use for `$hylo`, CRF extraction, counterfactual replay, source-governed direct or historical trials, sealed evidence, paired baseline/candidate evaluation, causal frontiers, or evidence-governed improvement.

64SKILL.mdUpdated Jul 13, 2026

tkersey/ledger

testing

VerifiedTrustedCommunity

Ensure a `ledger` command is available on PATH; materialize, validate, record, replay, and project requested Actuating artifacts without taking semantic or execution authority; coordinate the shared Learnings/Synesthesia/Negative Ledger lifecycle checkpoint and repo-local source-memory reconciliation; address Universalist plans and receipts; and perform pure artifact validation.

64SKILL.mdUpdated Jun 29, 2026

tkersey/review-fold

testing

VerifiedTrustedCommunity

Classify and quotient review findings, failing tests, incidents, bug reports, migration failures, and other witnessed falsifiers against accepted intent and the current Construction. Author counterexample-set/v1 without selecting repairs, counting review credit, or granting mutation.

64SKILL.mdUpdated Jun 28, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/tkersey/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/codex/skills/saddle-up ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

tkersey/dotfiles

49 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT