skills/hyperagent/SKILL.md
Run a self-referential self-improving agent loop where a meta-agent iteratively modifies a task-agent's code to optimize for any measurable target. Based on Facebook Research's Hyperagents paper (arXiv:2603.19461). Use when asked to "run hyperagent", "self-improve this", "optimize with self-modification", or "evolve this agent/script".
npx skillsauth add ckorhonen/claude-skills hyperagentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
New to Hyperagent? Try these beginner-friendly tasks before the full setup.
1. Optimize a simple Python script to run faster
Say: "Use hyperagent to optimize this script for speed" and paste something like:
# slow_sort.py
def sort_numbers(nums):
result = []
while nums:
smallest = min(nums)
result.append(smallest)
nums.remove(smallest)
return result
Hyperagent will benchmark it, propose a faster implementation, and validate the improvement.
2. Improve a prompt to get better answers
Say: "Run hyperagent on this prompt and improve accuracy" with a prompt like:
Summarize this article in one sentence.
The meta-agent iterates on the prompt, measures quality, and keeps improvements that score higher.
3. Make a sorting function more efficient
Say: "Evolve this function with hyperagent" and paste any function. Hyperagent creates a benchmark, runs generations of improvements, and shows you the performance gain per generation.
4. Self-improve any script
Say: "Self-improve this agent/script" and point to any Python file. Hyperagent wraps it in an evaluation loop, proposes modifications, and tracks what works.
The simplest possible setup: create
task.shthat printsMETRIC score=0.5, then runpython3 scripts/init_session.py. From there the loop is fully automated.
Self-referential self-improvement: a meta-agent that modifies a task-agent (and itself) to optimize any measurable objective.
Inspired by Facebook Research's Hyperagents paper (arXiv:2603.19461), which demonstrated that agents combining a task-solver and a self-modifying meta-level into a single editable program can achieve open-ended, compounding improvements that transfer across domains.
A hyperagent is a system with two components in a single editable codebase:
The key insight from the paper: when the meta-level modification procedure is itself editable, the system can improve not just task performance but also the mechanism that generates future improvements — enabling compounding, transferable gains.
Self-referential modification
The meta-agent can modify the task-agent's code AND its own strategy. Both live in the same editable workspace. This enables metacognitive self-improvement: improving how you improve.
Population-based exploration (archive)
Don't just keep the best variant — maintain an archive of all successful variants as stepping stones. Parent selection favors high performers with unexplored potential.
Empirical evaluation gates everything
No change is accepted without measurement. Every candidate is evaluated against the task benchmark with repeated trials.
Persistent memory and performance tracking
The system maintains a structured history of all experiments, hypotheses, and outcomes. Later generations build on earlier insights — no rediscovering dead ends.
Transfer across domains
Meta-level improvements (performance tracking, evaluation strategies, hypothesis generation patterns) are domain-agnostic and can be transferred to new tasks.
scripts/common.py — shared utilities (archive management, metrics, reporting)scripts/init_session.py — initialize a hyperagent session, scaffold the workspacescripts/run_task.py — evaluate a task-agent variant and record metricsscripts/log_variant.py — log evaluated record, decide disposition, update archive and reportsscripts/render_report.py — generate HTML report of the full evolutionary historyscripts/select_parent.py — select a parent from the archive for the next generationNote: There is no
generate_variant.pyscript — the meta-agent role (hypothesis generation and code modification) is performed by the LLM agent itself, not by a script.
All scripts are non-interactive, expose --help, emit structured JSON on stdout, and keep diagnostics on stderr.
Initialize the session after defining the optimization target:
python3 scripts/init_session.py \
--goal "Improve prompt accuracy on math benchmark" \
--metric-name accuracy \
--unit pct \
--direction higher \
--task-command ./task.sh \
--checks-command ./checks.sh \
--scope src/agent.py \
--max-generations 50
Evaluate the baseline (generation 0):
python3 scripts/run_task.py \
--id gen-000 \
--hypothesis "Control: unmodified task agent" \
--change-summary "No modifications" \
--baseline \
--output .hyperagent/gen-000.json
python3 scripts/log_variant.py --input .hyperagent/gen-000.json
Selection → Modification → Evaluation loop:
# Select a parent from the archive
python3 scripts/select_parent.py --output .hyperagent/parent.json
# Generate a variant (meta-agent proposes modifications)
# This is where YOU (the LLM agent) act as the meta-agent:
# - Read the parent's code and performance history
# - Hypothesize an improvement
# - Apply code modifications
# - Record what you changed and why
# Evaluate the variant
python3 scripts/run_task.py \
--id gen-001 \
--hypothesis "Add chain-of-thought prompting to improve reasoning" \
--change-summary "Wrap task prompt in step-by-step reasoning template" \
--parent gen-000 \
--output .hyperagent/gen-001.json
python3 scripts/log_variant.py --input .hyperagent/gen-001.json
Render reports at any time:
python3 scripts/render_report.py
Before starting, gather or confirm:
METRIC name=value linesPrefer a dedicated worktree on a fresh branch:
git worktree add ../hyperagent-<goal>-<date> -b hyperagent/<goal>-<date>
Create:
hyperagent.md — checked in, durable session brief with full evolutionary historytask.sh — checked in, benchmark runner (emits METRIC name=value)checks.sh — checked in, correctness gates.hyperagent/ — local artifact directory, NOT checked inEnsure artifacts stay untracked:
rg -qxF '.hyperagent/' .git/info/exclude || printf '\n.hyperagent/\n' >> .git/info/exclude
You (the LLM) are the meta-agent. Your job each generation is:
scripts/select_parent.py or choose based on the archivehyperagent.md)scripts/run_task.py to measure the variantscripts/log_variant.py to record the result and update the archivehyperagent.md with what you learnedThe meta-agent can improve its own process by updating:
hyperagent.md (hypothesis generation patterns, evaluation heuristics).hyperagent/memory.jsonl (qualitative insights, correction plans)These meta-improvements compound across generations and transfer to new tasks.
hyperagent.mdThe durable contract and evolutionary history. A fresh agent can resume from this.
# Hyperagent: <goal>
## Objective
<What is being optimized and why.>
## Configuration
- Primary metric:
- Unit:
- Direction:
- Minimum improvement: X%
- Task command:
- Correctness gates:
- Generation budget:
## Scope
- Task agent files:
- Meta-agent can self-modify: yes/no
## Archive
`.hyperagent/archive.jsonl`
## Lineage
<Tree showing parent→child relationships and which variants were kept>
## Meta-Strategy
<Current approach to hypothesis generation — updated as the meta-agent learns>
## What We've Learned
<Key wins, dead ends, transferable insights>
## Performance Tracking
<Best variant, improvement trajectory, current plateau status>
task.shBash script that runs the task agent and emits METRIC name=value lines:
#!/bin/bash
set -euo pipefail
# Run the task agent
python3 src/agent.py --input data/test.json 2>/dev/null
# The agent script should emit: METRIC accuracy=0.85
The archive (.hyperagent/archive.jsonl) stores every variant ever evaluated:
{
"id": "gen-007",
"generation": 7,
"parent_id": "gen-003",
"timestamp": "2026-03-27T20:00:00Z",
"hypothesis": "Add few-shot examples to improve pattern recognition",
"change_summary": "Inserted 3 domain-specific examples into the task prompt",
"files_touched": ["src/agent.py"],
"metric_name": "accuracy",
"direction": "higher",
"warmup_trials": [0.82, 0.83],
"measured_trials": [0.85, 0.86, 0.84, 0.85, 0.87],
"summary": {"median": 0.85, "mean": 0.854, "min": 0.84, "max": 0.87},
"checks": "passed",
"disposition": "keep",
"children_count": 0,
"meta_modifications": ["Updated strategy notes with few-shot pattern"],
"reason": "Improved by 3.2% over parent gen-003 (0.824). Checks passed."
}
Selection probability for a parent is proportional to:
This balances exploitation (good variants) with exploration (understudied variants).
python3 scripts/select_parent.py
# Output: {"selected_parent": "gen-003", "score": 0.824, "children": 1, "reason": "High performer with few children"}
keep — variant beats current best by ≥ threshold, checks passdiscard — variant is worse, equal, or improvement below thresholdchecks_failed — metric improved but correctness gates failedcrash — variant could not be evaluatedTrack improvement velocity. Stop or pivot when:
Run autonomously until:
During the loop:
hyperagent.md after every generationSymptom: Meta-strategy becomes over-specialized to early successes Fix: Periodically review and broaden the strategy; try categorically different approaches
Symptom: Archive grows large, selection becomes slow Fix: Archive old generations after 50 variants; maintain a compact summary
Symptom: Meta-agent modifies evaluation or logging in ways that break the loop Fix: Keep outer-loop scripts (init, run, log, select) immutable. Only modify task code and strategy notes.
Symptom: Later generations retry earlier failed ideas
Fix: Always read .hyperagent/memory.jsonl before proposing. Explicitly check against dead ends.
To transfer meta-improvements to a new domain:
hyperagent.md "What We've Learned" section.hyperagent/memory.jsonl as starting knowledgepython3 scripts/render_report.py
Generates .hyperagent/report.html with:
documentation
Create or expand an Idea.md / IDEA.md file from a rough description, existing repo, conversation history, notes, or other early-stage product inputs. Use when the user asks to "write an Idea.md", "turn this into an idea file", "capture this product idea", "expand this concept", or wants a repo-grounded concept brief before validation, PRD, or implementation work.
development
Write structured implementation plans from specs or requirements before touching code. Use when given a spec, requirements doc, or feature description, when user says "plan this out", "write a plan for", "how should we implement", or before starting any multi-step coding task.
testing
Expert guidance for video editing with ffmpeg, encoding best practices, and quality optimization. Use when working with video files, transcoding, remuxing, encoding settings, color spaces, or troubleshooting video quality issues.
development
Opinionated constraints for building better interfaces with agents. Use when building UI components, implementing animations, designing layouts, reviewing frontend accessibility, or working with Tailwind CSS, motion/react, or accessible primitives like Radix/Base UI.