
Use when user asks what something means, says "explain", "I don't understand", "teach me", "what is X", or asks about a term mid-session.
Use when user wants to test multiple angles on an idea, run parallel hypothesis lanes, or explore a gap from different entry points.
Use when user has a hypothesis and a repo and wants to run experiments. Triggered by "start experiments", "run the loop", "test this hypothesis".
# autoresearch skill > Adapted from [karpathy/autoresearch](https://github.com/karpathy/autoresearch) program.md. > This skill teaches Research Loop's Empirical agent how to run autonomous > nanochat/GPT training experiments using the autoresearch setup. ## What this skill is for You are the Empirical Agent operating on the `karpathy/autoresearch` codebase. Your job is to autonomously experiment with `train.py` to minimize `val_bpb` (validation bits per byte — lower is better). ## Repository
Use when experiments are running or just completed, or user shares results and wants to decide what to do next.
# SKILL: Getting Started > The bootstrap entry point for the Research Loop framework. > Every research session starts here. This skill tells you what skills exist, when to use them, and what the mandatory workflow is. --- ## You have skills. They give you superpowers. <session-start-hook> <EXTREMELY_IMPORTANT> You are operating inside Research Loop — an Agent OS for scientific research. RIGHT NOW, read this file fully before doing anything else. **Core rules:** 1. You have skills. They g
Use when user wants to test multiple angles on an idea, run parallel hypothesis lanes, or explore a gap from different entry points.
# SKILL: Getting Started > The bootstrap entry point for the Research Loop framework. > Every research session starts here. This skill tells you what skills exist, when to use them, and what the mandatory workflow is. --- ## You have skills. They give you superpowers. <session-start-hook> <EXTREMELY_IMPORTANT> You are operating inside Research Loop — an Agent OS for scientific research. RIGHT NOW, read this file fully before doing anything else. **Core rules:** 1. You have skills. They g
Use when user wants to find gaps, evaluate ideas, or decide what's worth pursuing. Triggered by "what hasn't been tried", "is this a good idea", "find the gap".
Use when user asks "where are we", "show me the status", "what do we have", "what's pending", "what should I do next", or wants to see the research decision tree.
Use when user wants to find gaps, evaluate ideas, or decide what's worth pursuing. Triggered by "what hasn't been tried", "is this a good idea", "find the gap".
Use when user has a selected hypothesis or route and needs a concrete execution plan broken into tasks. Triggered by "what are the next steps", "how do I start", "plan this out".
Use when user has a selected hypothesis or route and needs a concrete execution plan broken into tasks. Triggered by "what are the next steps", "how do I start", "plan this out".
Use when user mentions research, a topic, papers, experiments, gaps, or hypotheses. Entry point — load this first.
# SKILL: Review Preparation > What to do after the paper is written: surviving rejection, positioning for awards, and persisting without losing the thread. > Based on Carlini (2026) — "How to win a best paper award." --- ## When to use this skill Use this skill when a paper is complete and being prepared for submission, when a rejection arrives, and when evaluating whether a rejected paper is worth resubmitting or should be killed. Invoke with: `/review-prep` or `@reviewer: prepare for subm
Use when user asks "where are we", "show me the status", "what do we have", "what's pending", "what should I do next", or wants to see the research decision tree.
Use when experiments are running or just completed, or user shares results and wants to decide what to do next.
Use when user mentions research, a topic, papers, experiments, gaps, or hypotheses. Entry point — load this first.
Publication-quality figure generation for research papers. Decision agent selects figure type (code plot vs architecture diagram). Generates Matplotlib/Seaborn code for quantitative figures with iterative improvement loop. Style-matches conference templates (NeurIPS, ICML, ICLR). Use when the paper-pipeline reaches the figure generation phase, or when a user requests figures for an existing draft.
Experiment sandbox execution for Research Loop. Supports four modes: local (venv), Docker (isolated containers), SSH remote (GPU compute on servers), and Colab (Google Drive bridge). Provides experiment harness templates, code validation, metric collection, deterministic seeding, and compute budget enforcement. Use before running experiments generated by the paper-pipeline.
Use when user asks what something means, says "explain", "I don't understand", "teach me", "what is X", or asks about a term mid-session.
# SKILL: Writing Papers > How to write a paper that people actually read — with singular focus, a story that lands, self-contained figures, and a conclusion that does not waste its moment. > Based on Carlini (2026) — "How to win a best paper award." --- ## When to use this skill Use this skill when transitioning from the execution phase to drafting. Also use it when reviewing a draft in progress. For full end-to-end pipeline runs (topic to export), use `paper-pipeline` instead — this skill i
# SKILL: Writing Papers > How to write a paper that people actually read — with singular focus, a story that lands, self-contained figures, and a conclusion that does not waste its moment. > Based on Carlini (2026) — "How to win a best paper award." --- ## When to use this skill Use this skill when transitioning from the execution phase to drafting. Also use it when reviewing a draft in progress. For full end-to-end pipeline runs (topic to export), use `paper-pipeline` instead — this skill i
Use when user wants to explore a topic, find papers, map a field, or understand the research landscape. Not for explaining concepts — use learn for that.
Mandatory activation layer — loads on any conversation start. Establishes skill-loading protocol, Red Flags, priority rules, and HARD-GATE enforcement for all research-loop skills.
End-to-end paper generation pipeline ported from AutoResearchClaw (Aiming Lab). 14 phases covering topic initiation through export/publish, with human- in-the-loop gates and quality gating at each handoff. Use this when the user wants a full paper pipeline run — topic to submission-ready manuscript. Delegates to researcher/reviewer/writer/verifier subagents for stage execution and to autonomous-iteration for experiment optimization loops.
Use when user has a hypothesis and a repo and wants to run experiments. Triggered by "start experiments", "run the loop", "test this hypothesis".
Use when user mentions autonomous iteration, metric-driven optimization, $research-loop plan, $research-loop debug, $research-loop fix, $research-loop security, $research-loop ship, $research-loop scenario, $research-loop predict, $research-loop learn, $research-loop reason, $research-loop probe, or mentions "research-loop" with a goal/metric. Autonomous Goal-directed Iteration — apply Karpathy's autoresearch principles: modify, verify, keep/discard, repeat. Supports bounded mode via Iterations: N inline config.
Run a thorough, multi-phase deep research investigation on a topic with subagent dispatch, provenance tracking, and integrity verification.
Run a thorough, multi-phase deep research investigation on a topic with subagent dispatch, provenance tracking, and integrity verification.
# autoresearch skill > Adapted from [karpathy/autoresearch](https://github.com/karpathy/autoresearch) program.md. > This skill teaches Research Loop's Empirical agent how to run autonomous > nanochat/GPT training experiments using the autoresearch setup. ## What this skill is for You are the Empirical Agent operating on the `karpathy/autoresearch` codebase. Your job is to autonomously experiment with `train.py` to minimize `val_bpb` (validation bits per byte — lower is better). ## Repository
Mandatory activation layer — loads on any conversation start. Establishes skill-loading protocol, Red Flags, priority rules, and HARD-GATE enforcement for all research-loop skills.
Use when user wants to explore a topic, find papers, map a field, or understand the research landscape. Not for explaining concepts — use learn for that.
Run a structured literature review on a topic using parallel search, evidence tables with quality scoring, and primary-source synthesis.
Plan and execute a structured replication workflow for a paper, claim, or benchmark with environment selection and integrity checks.
Experiment sandbox execution for Research Loop. Supports four modes: local (venv), Docker (isolated containers), SSH remote (GPU compute on servers), and Colab (Google Drive bridge). Provides experiment harness templates, code validation, metric collection, deterministic seeding, and compute budget enforcement. Use before running experiments generated by the paper-pipeline.
# SKILL: Review Preparation > What to do after the paper is written: surviving rejection, positioning for awards, and persisting without losing the thread. > Based on Carlini (2026) — "How to win a best paper award." --- ## When to use this skill Use this skill when a paper is complete and being prepared for submission, when a rejection arrives, and when evaluating whether a rejected paper is worth resubmitting or should be killed. Invoke with: `/review-prep` or `@reviewer: prepare for subm
Use when user mentions autonomous iteration, metric-driven optimization, $research-loop plan, $research-loop debug, $research-loop fix, $research-loop security, $research-loop ship, $research-loop scenario, $research-loop predict, $research-loop learn, $research-loop reason, $research-loop probe, or mentions "research-loop" with a goal/metric. Autonomous Goal-directed Iteration — apply Karpathy's autoresearch principles: modify, verify, keep/discard, repeat. Supports bounded mode via Iterations: N inline config.
Publication-quality figure generation for research papers. Decision agent selects figure type (code plot vs architecture diagram). Generates Matplotlib/Seaborn code for quantitative figures with iterative improvement loop. Style-matches conference templates (NeurIPS, ICML, ICLR). Use when the paper-pipeline reaches the figure generation phase, or when a user requests figures for an existing draft.
Plan and execute a structured replication workflow for a paper, claim, or benchmark with environment selection and integrity checks.
End-to-end paper generation pipeline ported from AutoResearchClaw (Aiming Lab). 14 phases covering topic initiation through export/publish, with human- in-the-loop gates and quality gating at each handoff. Use this when the user wants a full paper pipeline run — topic to submission-ready manuscript. Delegates to researcher/reviewer/writer/verifier subagents for stage execution and to autonomous-iteration for experiment optimization loops.
Run a structured literature review on a topic using parallel search, evidence tables with quality scoring, and primary-source synthesis.