plugins/v1tamins/skills/v1-autoresearch-skill/SKILL.md
Autonomous optimization loop inspired by Karpathy's AutoResearch. Point it at any measurable target and it will modify, measure, keep improvements, and discard regressions. Triggers on "autoresearch", "optimize this", "run the loop", "karpathy loop", "improvement loop".
npx skillsauth add v1-io/v1tamins v1-autoresearch-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Implements Karpathy's AutoResearch pattern for any measurable optimization target. You modify one thing, measure the result, keep it if improved, discard it if not, and repeat.
Typical invocations:
/v1-autoresearch-skill [metric_command]v1-autoresearch-skill from the skills menu or use $v1-autoresearch-skill [metric_command]Examples:
/v1-autoresearch-skill
/v1-autoresearch-skill <metric_command>
In Codex, the slash examples below map directly to $v1-autoresearch-skill ....
Three phases: Setup, Loop, Debrief.
If the user did not provide arguments, collect these inputs interactively using AskUserQuestion:
Required inputs:
tests/, src/prompts/system.txt, webpack.config.js)pytest tests/ 2>&1 | tail -1 (parse time from pytest output)du -sb dist/ | cut -f1 (bundle size in bytes)hyperfine --runs 3 'node index.js' --export-json /dev/stdout | jq '.results[0].mean'lower (lower is better, e.g., seconds, bytes, cost) or higher (higher is better, e.g., accuracy, pass rate)Optional inputs (with defaults):
pytest tests/ --tb=no -q (all tests must pass)501203Before starting the loop, verify all preconditions:
1. Confirm we are in a git repository: `git rev-parse --is-inside-work-tree`
2. Confirm working tree is clean: `git status --porcelain` should be empty
- If dirty, ask user whether to stash or abort
3. Confirm asset path exists
4. Confirm metric command runs successfully and produces a parseable number
5. If constraint command provided, confirm it passes now (exit 0)
If any validation fails, report the failure clearly and stop. Do not start the loop with a broken setup.
Run the metric command {metric_runs} times. Take the median value. This is the BASELINE and also the initial BEST_KNOWN.
Display:
Baseline established: {BASELINE} ({direction})
Metric command: {metric_command}
Constraint: {constraint_command or "none"}
Asset: {asset_path}
Cycles: {max_cycles} | Time limit: {time_limit}min
Create or append to autoresearch-results.tsv in the current working directory:
timestamp cycle change metric_before metric_after verdict
git tag -f autoresearch-baseline
Read the strategy reference file for this skill:
${SKILL_DIR}/references/strategies.md
Use the strategies relevant to the current optimization target as a library of approaches to draw from during the loop. Prioritize high-impact strategies first.
Repeat until a stop condition is met:
Make exactly ONE modification to the asset. This is critical -- do not combine multiple changes in a single cycle. Isolating the variable is what makes the loop work.
Good changes:
Bad changes (never do these):
git add -A
git commit -m "autoresearch cycle {N}: {brief description of change}"
This commit happens BEFORE we know if the change helps. This is intentional -- it makes reverts clean.
Run the constraint command. If it exits non-zero:
git revert HEAD --no-edit
Log the result:
{timestamp} {cycle} {description} {BEST_KNOWN} CONSTRAINT_FAIL DISCARDED
Skip to 2.8.
Run the metric command {metric_runs} times. Take the median. This is NEW_SCORE.
Compare NEW_SCORE to BEST_KNOWN respecting direction:
If improved:
{timestamp} {cycle} {description} {old_best} {NEW_SCORE} KEPTCycle {N}: KEPT -- {description} ({old_best} -> {NEW_SCORE}, {improvement}%)If NOT improved:
git revert HEAD --no-edit
{timestamp} {cycle} {description} {BEST_KNOWN} {NEW_SCORE} DISCARDEDCycle {N}: DISCARDED -- {description} ({BEST_KNOWN} -> {NEW_SCORE}, no improvement)After every 5 cycles, display a brief progress summary:
--- Progress (cycle {N}/{max_cycles}) ---
Baseline: {BASELINE}
Current best: {BEST_KNOWN} ({improvement from baseline}%)
Kept: {kept_count} | Discarded: {discarded_count}
Time elapsed: {elapsed}min / {time_limit}min
---
Stop the loop if ANY of these are true:
git tag -f autoresearch-final
========================================
AUTORESEARCH COMPLETE
========================================
Baseline: {BASELINE}
Final best: {BEST_KNOWN}
Improvement: {improvement_pct}%
Cycles run: {total_cycles}
Kept: {kept_count}
Discarded: {discarded_count}
Elapsed: {elapsed_minutes}min
Results log: autoresearch-results.tsv
KEPT CHANGES (in order):
1. {change_description} ({before} -> {after}, +{pct}%)
2. {change_description} ({before} -> {after}, +{pct}%)
...
Stop reason: {cycle limit | time limit | diminishing returns | user interrupt}
========================================
Based on results, suggest:
These are inviolable rules during the loop:
The metric command should:
Examples:
# Pytest execution time (seconds)
/usr/bin/time -p pytest tests/ --tb=no -q 2>&1 | grep real | awk '{print $2}'
# Bundle size (bytes)
npm run build 2>/dev/null && du -sb dist/ | cut -f1
# Lighthouse performance score
lighthouse http://localhost:3000 --output=json --quiet | jq '.categories.performance.score * 100'
# Python script execution time
/usr/bin/time -p python3 script.py 2>&1 | grep real | awk '{print $2}'
# Test count (if optimizing for fewer tests while maintaining coverage)
pytest --collect-only -q 2>/dev/null | tail -1 | grep -oP '\d+'
development
Run an extremely strict maintainability review for abstraction quality, giant files, and spaghetti-condition growth. Use for large prs, new features/architectures, a deep code quality audit, or especially harsh maintainability review.
testing
Commit, push, open, and land a pull request through CI handoff. Use when work is complete and the user wants an agent to create or update a PR, open it as a draft, monitor GitHub checks with `gh pr checks`, fix failed checks, retry up to three remediation pushes, mark the PR ready for review once green, and move a linked Linear ticket to Human Review when one exists. Trigger on requests like 'land this PR', 'open and monitor a PR', 'commit push and watch CI', 'get this ready for review', or 'finish the PR workflow'.
development
Use when reviewing a PR, reviewing the current branch, or posting code review feedback to GitHub. Triggers on "review this PR", "code review", "check this pull request", "review my branch", "review and fix".
development
Use when a plan, PRD, proposal, or implementation outline is overscoped, too ambitious for the immediate goal, or needs to be reduced to a bare-bones version. Triggers on "bare bones", "no damn whistles", "no bells and whistles", "strip this plan", "trim this plan", "scope creep", "descope this plan", "MVP only".