mcp/skills/autoresearch/SKILL.md
Set up and run autonomous experiment loops to optimize Simmer trading skills. Mutates skill code + config, measures P&L, keeps what works. Use when asked to "optimize a skill", "run autoresearch", or "improve my trading".
npx skillsauth add spartanlabsxyz/simmer-sdk simmer-autoresearchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Autonomous experiment loop for trading skill optimization: try ideas, keep what works, discard what doesn't, never stop.
Based on pi-autoresearch (MIT).
init_experiment — configure session (name, skill_slug, metric, unit, direction). Call again to re-initialize with a new baseline.run_experiment — runs skill command, times it, captures output.log_experiment — records result. Use keep ONLY when the primary metric improved vs the baseline. discard if worse or unchanged. Zero trades or metric=0 is no-signal — discard, never keep. crash if the skill failed. checks_failed if post-run validation failed. keep auto-commits via git; the others auto-revert. Always include secondary metrics dict. State the before→after comparison in description (e.g., "entry_threshold 0.05→0.03; $12→$18 pnl, keep"). Optionally include asi (Actionable Side Information) for structured diagnostics.backtest_experiment — replay historical trades against new config without live execution. Fast config tuning (seconds vs hours). Requires trades with signal_data (SDK 0.9.17+).git checkout -b autoresearch/<skill>-<date>autoresearch.md — session spec with goal, metrics, how to run, constraints.autoresearch.sh — single command that runs the skill and outputs results.init_experiment → run baseline with run_experiment → log_experiment → start looping.# Autoresearch: <goal>
## Objective
<What we're optimizing and the workload.>
## Metrics
- **Primary**: <name> (<unit>, lower/higher is better)
- **Secondary**: <name>, <name>, ...
## How to Run
`./autoresearch.sh` — runs the skill for one cycle.
## Constraints
- Only modify files in <skill directory>
- Do not change SDK core code
- Sim venue only (no real money)
Each iteration:
run_experiment — execute the skilllog_experiment — compare metric to baseline. Improved → keep. Worse or equal → discard. Crashed → crash. Include the before→after comparison in description.Code mutations > config tuning. Structural changes (new data sources, different models, alternative strategies) find bigger wins than parameter tweaks.
Use backtest_experiment for fast config exploration before committing to live runs.
keep. Worse or equal → discard. Secondary metrics rarely override this — only discard a primary improvement if a secondary metric degraded catastrophically, and explain why in description.discard, never keep. If the experiment produced 0 trades, metric=0, or no measurable signal, this is a degenerate run — the skill stopped doing the thing you're trying to optimize. Discard so the next iteration tries something different. If you see this twice in a row, stop the loop and investigate the skill itself before mutating further. A dead loop produces meaningless commits and burns runs.description. Every log_experiment should make the before→after explicit (e.g., "reduced entry threshold 0.05→0.03; $12→$18 pnl, 4→6 trades, keep"). This is load-bearing for future iterations and for the dashboard to reason about your decisions.keep. Ugly complexity for tiny gain = probably discard.NEVER STOP. The user may be away for hours. Keep the loop running until interrupted.
init_experiment to start fresh.autoresearch.md and autoresearch.jsonl to restore context.Set via environment variables on the MCP server:
| Variable | Default | Purpose |
|----------|---------|---------|
| SIMMER_API_KEY | (required) | API key for dashboard sync and backtest |
| SIMMER_API_URL | https://api.simmer.markets | API base URL |
| AUTORESEARCH_MAX_EXPERIMENTS | 50 | Max experiments per session (0 = unlimited) |
data-ai
Copy the top World Cup traders on Polymarket — auto-curated daily by Simmer. No wallet list to configure; the skill sources leaders via PolyNode's slippage-adjusted copy-PnL screen. Regular mode (daily rebalance). Free tier.
tools
# Fixture Instruction-Only Skill This is a Tier-A instruction-only fixture used to verify that invoking an instruction-only skill returns its SKILL.md playbook instead of an error. UNIQUE_FIXTURE_MARKER_4815162342
development
Fade sharp in-play price shocks on Polymarket soccer markets with a laddered limit-buy strategy (Roan's FIFA-quant framework). Pro skill. Currently scoped to 2026 World Cup markets. Simmer's server detects shocks in real time and emits pre-sized signals; this skill places the recovery ladder and manages the exit.
development
Build and optionally execute a three-tranche Polymarket DCA plan with prop-firm-shaped evaluation envelope checks. Use when the user wants a Bubbles/Roya-style staged averaging template for one thesis, with paper mode by default and explicit live opt-in.