
Generate model cards, reproducibility statements, and datasheet documentation for ML models and datasets. Use when releasing a model, completing venue-required artifact documentation, or writing a reproducibility/datasheet section for NeurIPS, ICLR, ICML, or artifact evaluation.
Check LaTeX academic papers before submission. Use for readiness, final mode, camera-ready preparation, source hygiene, and conference deadlines.
Produce a multi-paper comparison matrix across a literature corpus with tiered read depth. Use when multiple papers need to be compared side-by-side for method differences, performance gaps, closest-work ranking, or trend identification — distinct from per-paper source cards (reference-reading-summarizer) and single-paper project linking (reference-project-synthesizer).
Use when results are valid but surprising, negative, unstable, or ambiguous — to decide debug/rerun/ablate/revise/park. Not for engineering failures like NaN/OOM (use experiment-debugger). Not for confound or claim-drift audit before locking results into the paper (use research-results-auditor).
Run multi-round review-implement-recompile improvement cycles on a paper draft. Use when a draft needs iterative writing quality passes with reviewer independence (fresh context per review round), edit-whitelist gating, and crash-resumable state. Distinct from paper-reviewer-simulator (report only) and paper-draft-consistency-editor (single pass).
Plan and draft ML/AI introductions as venue-aware argument chains. Use for hook, gap, insight, method, result, contribution flow, and paragraph roles.
Decide what an ML/AI paper should strategically sell. Use for contribution choice, claim scope, paper archetype, novelty framing, audience, and claims to avoid.
Connect structured reference source cards to the active ML project. Use when papers, collaborator docs, Markdown notes, specs, scripts, BibTeX files, or source bundles should inform claims, risks, baselines, benchmarks, experiments, algorithm design, implementation, writing contracts, citations, collaborator actions, project initialization, or memory writeback.
Plan and report statistical rigor for ML experiment results. Use when significance testing, effect size reporting, confidence intervals, seed variance analysis, or multiple-comparison corrections are needed before including results in a paper or rebuttal.
Run artifact-driven sidecar agent tasks through one-shot Codex CLI sessions. Use when a main agent should delegate bounded scans, drafts, audits, pre-reviews, or mechanical repo tasks to a fast isolated sidecar model such as gpt-5.3-codex-spark while keeping final decisions with the main agent.
Review ML/AI result tables, LaTeX table files, captions, provenance, and paper table style. Use for benchmark, ablation, metric, model-spec, and compute tables.
Turn ML/AI tables, figures, ablations, and metrics into claim-aware results prose. Use for result paragraphs, figure/table narrative, and provisional metrics.
Manage project reference sources under reference/. Use when scanning, ingesting, indexing, deduplicating, monitoring, or tracking processing status for papers, PDFs, Word docs, Markdown notes, BibTeX files, scripts, specs, or source bundles without deeply reading them.
Audit private skills, memories, notes, or operational logs before turning them into public skills, templates, docs, or reusable patterns. Use when scanning personal/private memory for publishable knowledge, redaction needs, privacy risks, source-visibility leaks, or PR-ready public skill candidates.
Plan and write appendix or supplementary material for ML papers. Use when the appendix needs to be structured, main-paper claim boundaries need to be enforced, NeurIPS/ICLR reproducibility checklists need sections, or cross-references between paper and supplement need to be aligned.
Plan mid-project direction changes when consistent negative results or novelty challenges require scope narrowing, angle change, or kill decisions. Use after multiple result-diagnosis cycles fail to recover the original claim. Distinct from research-idea-validator (project start) and result-diagnosis (per-experiment).
Read and summarize project reference sources into structured source cards. Use for skimming papers, PDFs, Word docs, Markdown notes, BibTeX files, scripts, specs, collaborator feedback, or source bundles; extract writing patterns, methods, theory, benchmarks, baselines, implementation hints, risks, constraints, and project seeds without yet deciding project implications.
Run isolated code reviews for core algorithm or production code changes. Use when the user asks for a fresh-context reviewer, writer/reviewer separation, Spark pre-review, code review, implementation audit, review bundle, independent review, or review artifacts under `.agent/code-reviews/`.
Audit project token usage from local Codex, Codex sidecar, and Claude Code logs. Use when the user asks to measure token burn, token consumption, project attention, agent usage, Codex/Claude Code usage, sidecar usage, token efficiency, or lifecycle telemetry for a project.
Project-local router for ML research skill selection. Use inside an initialized ML research project, or while maintaining this skill repo, when the user describes an ML research/paper/experiment/discovery/ops/release workflow and may not know the skill; route to a domain router or high-signal leaf. Do not use for generic non-ML projects.
Route project operations tasks — git, memory, bootstrap, remote, workspace, code review, timeline, ops — to the correct skill. Use when the task involves commits, pushes, worktrees, project memory, enabling project-local skills, SSH/server coordination, sidecar runners, or audits. Do not solve the ops task directly.
Use when writing or revising actual paper prose — sections, result narratives, venue-aware style, provisional metrics. Not for planning the writing contract before drafting (use paper-writing-contract-planner). Not for tracking section status or edit-state across drafting sessions (use paper-writing-memory-manager).
Route ML/AI paper writing tasks to the correct skill — contract planning, prose drafting, section writing, consistency editing, review simulation, rebuttal, submission, or citation work. Use when the task involves writing, revising, reviewing, or submitting a paper instead of guessing between paper-writing-assistant, paper-writing-contract-planner, paper-reviewer-simulator, auto-paper-improvement-loop, or citation skills. Do not draft prose directly.
Bootstrap project-local ml-research-skills. Use from global installs when creating a new ML research project, enabling this collection in an existing ML research repo, or deciding whether to install the full bundle locally. Route to project-init for new projects; do not handle paper or experiment work directly.
Plan and draft ML/AI method sections. Use for notation flow, module ordering, algorithm boxes, overview figures, design rationale, and appendix boundaries.
Draft ML/AI limitations, scope, failure cases, ethics, and conclusion caveats. Use to control claim boundaries and reduce overclaiming.
Finalize accepted ML/AI papers for camera-ready submission. Use for de-anonymization, rebuttal promises, supplement updates, final LaTeX checks, and release handoff.
Maintain automatic personalization writeback from agent trajectories, logs, sidecar artifacts, and repeated user preferences. Use when a task produces reusable preferences, lessons, private user memory, project contracts, or candidate public skill rules without interrupting the user.
Write structured experiment reports from notes, configs, logs, metrics, tables, and figures. Use for result analysis, research updates, and presentation-ready summaries.
Review ML/AI result figures, captions, LaTeX wrappers, and visual style. Use for paper plots, figure screenshots, result narratives, and venue-ready figure polish.
Initialize a LaTeX academic paper project. Use for new conference or journal papers needing templates, macros, venue preambles, and writing guidance.
Build paper-facing tables and figures from CSV experiment outputs. Use to inventory evidence, aggregate seeds, select result slices, generate LaTeX assets, and record provenance.
Create repo-local LaTeX layout issue bundles from a PDF page, crop, source snippet, and compile log. Use when the user wants to avoid manual PDF screenshots, capture page-specific layout problems, or hand Codex/Claude Code a reproducible paper layout debugging artifact.
Edit ML/AI paper drafts for internal consistency. Use after sections exist to align claims, terminology, figures, tables, captions, limitations, and conclusion.
Perform Git operations safely with sandbox-aware failure handling. Use for commit, push, merge, rebase, stash, worktree, conflicts, lock files, permission errors, or Git state diagnosis.
Estimate GPU compute budget before running ML experiments. Use when planning how much compute an experiment, ablation matrix, or sweep will cost, sizing smoke tests, finding cheaper alternatives, or deciding whether a planned run fits available resources.
Turn inbound advisor, collaborator, or reviewer feedback into structured project updates. Use when meeting notes, emails, or review comments need to become claim updates, risk entries, action items, and experiment decisions — distinct from rebuttal writing for formal reviews.
Draft ML/AI paper titles, abstracts, and contribution lists. Use for title options, abstract structure, contribution bullets, and claim-strength calibration.
Route research discovery tasks — idea validation, literature review, reference reading, corpus comparison, or project synthesis — to the correct skill. Use when the task involves exploring ideas, surveying literature, reading papers, comparing multiple papers, or connecting references to the project. Do not perform the review or synthesis directly.
Initialize or enhance a Python/ML project. Use for new repos or forks needing production structure, uv environment setup, and research evidence docs.
Route ML experiment planning, execution, debugging, result interpretation, and evidence packaging tasks to the correct skill. Use this when the task involves experiments, compute, results, or evidence — instead of guessing between run-experiment, run-status-monitor, experiment-debugger, result-diagnosis, research-results-auditor, statistical-analysis-planner, or paper packaging skills. Do not solve the task directly.
Use to track nonlinear drafting state — section status, claim-text dependencies, stale prose, style decisions, and edit impact across sessions. Not for writing prose (use paper-writing-assistant). Not for planning the initial writing contract (use paper-writing-contract-planner).
Use when training has engineering failures — NaN/gradient issues, GPU OOM, slow data loading, wrong metrics, reproducibility failures. Not for checking job queue/status (use run-status-monitor). Not for valid-but-surprising scientific results (use result-diagnosis). Not for confound or claim audit before writing (use research-results-auditor).
Create Git branches or worktrees for research code and paper versions. Use for experiments, baselines, rebuttal fixes, arXiv/camera-ready branches, and worktree memory.
Use when auditing completed results for confounds, claim-drift, protocol integrity, or attribution before locking claims into the paper. Not for deciding what to do after a surprising result (use result-diagnosis). Not for significance tests or effect sizes (use statistical-analysis-planner). Not for engineering failures (use experiment-debugger).
Coordinate local, Git remote, and SSH/HPC/RunAI research projects. Use for server state, sync safety, job submission, interactive sessions, logs, artifact lookup, context recovery, raw SSH commands, remote shell one-liners, SSH quoting issues, remote-cmd, remote-bash, or avoiding local shell expansion of remote variables.
Maintain hierarchical ML research project memory. Use for claim, evidence, provenance, risk, action, handoff, worktree, phase, source-visibility, paper/code/slides, review, and rebuttal state.
Use when launching or preparing a new ML experiment job — local, SLURM, or RunAI. Not for checking existing job status (use run-status-monitor). Not for NaN/OOM/crash debugging (use experiment-debugger). Not for computing costs before deciding to run (use compute-budget-planner).
Initialize an ML research project control root. Use for paper/code/slides repos, shared memory, GitHub Project alignment, agent guidance, worktree policy, and lifecycle handoffs.
Use before drafting starts to lock venue, archetype, section order, paragraph roles, evidence slots, and forbidden claims. Not for writing actual prose (use paper-writing-assistant). Not for tracking section status during drafting (use paper-writing-memory-manager).
Use when probing the status of an existing job — queued, stuck, running, or finished — across local, SLURM, RunAI, or SSH. Not for launching new jobs (use run-experiment). Not for debugging NaN/OOM/engineering failures (use experiment-debugger). Not for interpreting valid but surprising results (use result-diagnosis).
Audit a skill collection for consistency, lifecycle coverage, routing, documentation drift, memory writeback, stale references, helper paths, and validation readiness.
Manage ML dataset pipelines before training. Use when the user needs to acquire, preprocess, split, or version datasets, design train/val/test protocols, audit data quality, check for train/test contamination, or make data decisions that affect experimental validity and reviewer trust.
Build retrospective or forward-looking work timelines from git history, docs, notes, or chat records. Use for progress summaries, mentor reports, and phase planning.
Mine existing results for paper evidence gaps before new compute. Use when claims lack support, CSVs may already contain evidence, or tables/figures can be derived.
Audit ML/AI experimental baselines for necessity, fairness, currency, and reviewer risk. Use when choosing baselines or checking SOTA comparisons.
Sync verified code-side experiment results into paper memory. Use when logs, reports, run docs, or user-confirmed metrics should become paper-facing evidence.
Draft ML/AI related work as novelty-boundary writing. Use for closest-work grouping, citation roles, paragraph plans, boundary statements, and safe novelty wording.
Simulate target-conference reviewers for an ML/AI paper. Use for reviewer critique, predicted scores, reject risks, meta-review, and pre-submission risk audit.
Prepare research code repositories for public release. Use for open-source cleanup, README/LICENSE/CITATION, GitHub releases, tags, and reproducibility packages.
Plan and write ML/AI rebuttals after real reviews arrive. Use for reviewer intent, response strategy, follow-up experiments, point-by-point replies, and revision promises.
Validate rough CS/AI research ideas with the FIVE+C framework. Use to decide pursue, revise, park, or kill based on novelty, feasibility, evidence, and risks.
Refresh project documentation after code changes. Use after implementing features, changing behavior, or preparing a milestone commit.
Design and write reusable research slide decks. Use for advisor updates, lab talks, reading reports, proposals, conference talks, Slidev content, and slide structure.
Create annotated Git milestone tags. Use when completing a phase, releasing a version, marking a research checkpoint, or generating a milestone summary from git history before tagging.
Turn an ML/AI research idea into a concrete method design. Use for objectives, architecture, inference, assumptions, ablations, and implementation handoff.
Write advisor, mentor, lab-meeting, or collaborator updates. Use for weekly updates, progress memos, decision requests, blocker summaries, and next actions.
Prepare research artifact packages for evaluation or public release. Use for reproduction commands, environment checks, data packaging, and artifact forms.
Audit whether a paper cites necessary classic, closest, benchmark, and recent work. Use for missing-citation checks and related-work coverage before submission.
Audit LaTeX citations and references before submission. Use for BibTeX metadata, unresolved keys, invalid labels, DOI/arXiv data, and citation-claim support.
Design hypothesis-driven ML/AI experiments before running. Use for ablations, baselines, metrics, controls, seeds, logging, and claim-evidence matrices.
Adapt ML/AI paper writing to a target venue. Use for venue style, structure, positioning, reviewer-friendly prose, and section or paragraph guidance.
Run a focused CS/AI literature review sprint. Use to survey a topic, map related work, check novelty, rank papers, assess closest-work risk, and derive next actions.
Maintain a paper-facing evidence board. Use to align claims, experiments, figures, tables, sections, reviewer risks, evidence gaps, and next actions during writing.