internal/embed/skills/autoresearch/SKILL.md
Run autonomous LLM optimization experiments (autoresearch) and publish optimized models for paid inference via x402.
npx skillsauth add obolnetwork/obol-stack autoresearchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Autonomous LLM optimization: the agent iterates on train.py, runs 5-minute GPU experiments, measures validation bits-per-byte (val_bpb), and publishes the best checkpoint as a sellable Ollama model.
sellbuy-x402obol-stackPlace your training and validation data in the autoresearch working directory:
autoresearch/
train.bin # training data (tokenized)
val.bin # validation data (tokenized)
train.py # training script (agent modifies this)
results.tsv # experiment log (appended by each run)
The agent modifies train.py and runs experiments in a loop. Each experiment:
results.tsvThe results.tsv file is tab-separated with columns:
commit_hash val_bpb status description
a1b2c3d 1.042 keep baseline transformer
e4f5g6h 1.038 keep added RMSNorm
i7j8k9l 1.051 discard unstable lr schedule
Once experiments are complete, use publish.py to find the best checkpoint, register it with Ollama, and optionally sell it:
# Publish to Ollama only
python3 scripts/publish.py /path/to/autoresearch
# Publish and sell via x402
python3 scripts/publish.py /path/to/autoresearch \
--sell \
--wallet 0xYourWalletAddress \
--price 0.002 \
--chain base-sepolia
| Command | Description |
|---------|-------------|
| publish.py <dir> | Find best experiment, create Ollama model, generate provenance |
| publish.py <dir> --sell --wallet <addr> --price <p> --chain <c> | Publish and sell via obol sell inference |
Experiment loop: The agent edits train.py, runs training for up to 5 minutes, measures val_bpb on the validation set, and commits the result with a keep/discard verdict.
Selection: publish.py reads results.tsv, filters for status=keep, and selects the experiment with the lowest val_bpb (lower is better — fewer bits per byte means better compression / prediction).
Provenance: A JSON provenance file is generated with:
framework: training framework usedmetricName: metric identifier (val_bpb)metricValue: winning metric value as a stringtrainHash: sha256: hash of the train.py at the winning commitparamCount: model parameter count as a stringexperimentId: git commit hash of the winning experimentOllama registration: A Modelfile is generated from the checkpoint and ollama create registers the model locally.
Sell (optional): If --sell is passed, runs obol sell inference with the --provenance-file flag pointing at the provenance JSON so buyers can verify optimization lineage.
Agent (autoresearch loop)
|
+-- edit train.py
+-- run experiment (5-min budget)
+-- measure val_bpb
+-- commit results.tsv
|
v
publish.py
|
+-- read results.tsv → best experiment
+-- git show <commit>:train.py → SHA-256 trainHash
+-- generate provenance.json
+-- generate Modelfile → ollama create
+-- (optional) obol sell inference --provenance-file
.pt, .safetensors) with llama.cpp/convert_hf_to_gguf.pycommit_hash, val_bpb, status, descriptionWhen registering an autoresearch-optimized model on-chain via ERC-8004:
devops_mlops/model_versioningresearch_and_development/scientific_researchreferences/autoresearch-overview.md — val_bpb metric, time budget, and the train.py modification loopdata-ai
Spawn durable child Hermes agents from inside Obol Stack. Creates child namespaces, optional profile/env Secrets, Agent CRDs, and optional ServiceOffers for x402-paid child services.
data-ai
Buy from any x402-gated endpoint. Two flows: `pay` for one-shot HTTP services (single authorization, no sidecar), and `buy` for long-running paid inference (pre-authorized batch via PurchaseRequest, exposed as `paid/<remote-model>`). Supports USDC (EIP-3009) and OBOL (Permit2). Zero signer access at runtime — spending is capped by design and nothing moves on-chain until a voucher is spent.
testing
Sell access to services via x402 payment gating. Create ServiceOffer CRDs that automatically health-check upstreams, create payment-gated routes, and optionally pull models and register on ERC-8004. Supports inference, HTTP, and fine-tuning service types.
testing
End-to-end guide for monetizing GPU resources or HTTP services through obol-stack. Covers pre-flight checks, model detection, pricing research, selling via x402, ERC-8004 registration, and verification. Use this skill when the user wants to monetize their machine.