Workflow 1: Idea Discovery Pipeline

Orchestrate a complete idea discovery workflow for: $ARGUMENTS

Overview

This skill chains sub-skills into a single automated pipeline:

/research-lit → /idea-creator → /novelty-check → /research-review → /research-refine-pipeline
  (survey)      (brainstorm)    (verify novel)    (critical feedback)  (refine method + plan experiments)

Each phase builds on the previous one's output. The final deliverables are a validated idea-stage/IDEA_REPORT.md with ranked ideas, plus a refined proposal (refine-logs/FINAL_PROPOSAL.md) and experiment plan (refine-logs/EXPERIMENT_PLAN.md) for the top idea.

Constants

PILOT_MAX_HOURS = 2 — Skip any pilot experiment estimated to take > 2 hours per GPU. Flag as "needs manual pilot" in the report.
PILOT_TIMEOUT_HOURS = 3 — Hard timeout: kill any running pilot that exceeds 3 hours. Collect partial results if available.
MAX_PILOT_IDEAS = 3 — Run pilots for at most 3 top ideas in parallel. Additional ideas are validated on paper only.
MAX_TOTAL_GPU_HOURS = 8 — Total GPU budget across all pilots. If exceeded, skip remaining pilots and note in report.
AUTO_PROCEED = true — If user doesn't respond at a checkpoint, automatically proceed with the best option after presenting results. Set to false to always wait for explicit user confirmation.
REVIEWER_MODEL = gpt-5.6-sol — Model used via Codex MCP. Must be an OpenAI model (e.g., gpt-5.6-sol, o3, gpt-4o). Passed to sub-skills.
OUTPUT_DIR = idea-stage/ — All idea-stage outputs go here. Create the directory if it doesn't exist.
ARXIV_DOWNLOAD = false — When true, /research-lit downloads the top relevant arXiv PDFs during Phase 1. When false (default), only fetches metadata. Passed through to /research-lit.
COMPACT = false — When true, generate compact summary files for short-context models and session recovery. Writes idea-stage/IDEA_CANDIDATES.md (top 3-5 ideas only) at the end of this workflow. Downstream skills read this instead of the full idea-stage/IDEA_REPORT.md.
RENDER_HTML = true — When true (default), auto-render idea-stage/IDEA_REPORT.md to HTML at workflow end via /render-html. Uses --no-review (the source MD already went through novelty + cross-model review during Phase 3). Set false to skip, or pass — render html: false.
REF_PAPER = false — Reference paper to base ideas on. Accepts: local PDF path, arXiv URL, or any paper URL. When set, the paper is summarized first (idea-stage/REF_PAPER_SUMMARY.md), then idea generation uses it as context. Combine with base repo for "improve this paper with this codebase" workflows.

💡 These are defaults. Override by telling the skill, e.g., /idea-discovery "topic" — ref paper: https://arxiv.org/abs/2406.04329 or /idea-discovery "topic" — compact: true.

Pipeline

Phase 0: Load Research Brief (if available)

Before starting any other phase, check for a detailed research brief in the project:

Look for RESEARCH_BRIEF.md in the project root (or path passed as $ARGUMENTS)
If found, read it and extract:
- Problem statement and context
- Constraints (compute, data, timeline, venue)
- What the user already tried / what didn't work
- Domain knowledge and non-goals
- Existing results (if any)
Use this as the primary context for all subsequent phases — it replaces the one-line prompt
If both RESEARCH_BRIEF.md and a one-line $ARGUMENTS exist, merge them (brief takes priority for details, argument sets the direction)

If no brief exists, proceed normally with $ARGUMENTS as the research direction.

💡 Create a brief from the template: cp templates/RESEARCH_BRIEF_TEMPLATE.md RESEARCH_BRIEF.md — keep it to ~1-2 pages (4-8k chars); long material goes in separate files referenced by path.

Phase 0.5: Reference Paper Summary (when REF_PAPER is set)

Skip entirely if REF_PAPER is false.

Summarize the reference paper before searching the literature:

If arXiv URL (e.g., https://arxiv.org/abs/2406.04329):
- Invoke /arxiv "ARXIV_ID" — download to fetch the PDF
- Read the first 5 pages (title, abstract, intro, method overview)
If local PDF path (e.g., papers/reference.pdf):
- Read the PDF directly (first 5 pages)
If other URL:
- Fetch and extract content via WebFetch
Generate idea-stage/REF_PAPER_SUMMARY.md:

# Reference Paper Summary

**Title**: [paper title]
**Authors**: [authors]
**Venue**: [venue, year]

## What They Did
[2-3 sentences: core method and contribution]

## Key Results
[Main quantitative findings]

## Limitations & Open Questions
[What the paper didn't solve, acknowledged weaknesses, future work suggestions]

## Potential Improvement Directions
[Based on the limitations, what could be improved or extended?]

## Codebase
[If `base repo` is also set: link to the repo and note which parts correspond to the paper]

🚦 Checkpoint: Present the summary to the user:

📄 Reference paper summarized:
- Title: [title]
- Key limitation: [main gap]
- Improvement directions: [2-3 bullets]

Proceeding to literature survey with this as context.

Phase 1 and Phase 2 will use idea-stage/REF_PAPER_SUMMARY.md as additional context — /research-lit searches for related and competing work, /idea-creator generates ideas that build on or improve the reference paper.

Phase 1: Literature Survey

Invoke /research-lit to map the research landscape. Idea discovery is exactly the place where Gemini's AI-driven broad coverage adds value, so include gemini as a source by default unless the user already specified an explicit — sources: directive in their idea-discovery invocation:

# If $ARGUMENTS already contains "— sources:", pass through unchanged
# (the user is in control of source selection):
/research-lit "$ARGUMENTS" — composed: idea-stage/IDEA_REPORT.md

# Otherwise (the common case), include gemini explicitly for broader discovery:
/research-lit "$ARGUMENTS" — sources: all, gemini — composed: idea-stage/IDEA_REPORT.md

— composed: idea-stage/IDEA_REPORT.md puts /research-lit in composed mode (see Output hygiene above): it returns the landscape for folding into the report instead of writing a standalone landscape file. The report doesn't exist yet at Phase 1 — the directive names the forthcoming canonical doc, and /idea-creator creates it in Phase 2.

If gemini-cli is not installed, /research-lit skips the Gemini source gracefully with a warning — no break to the pipeline. Users who want to force-disable Gemini in idea-discovery can pass /idea-discovery "topic" — sources: all explicitly (which becomes the literal source list, no auto-injection).

What this does:

Search arXiv, Google Scholar, Semantic Scholar for recent papers
Plus Gemini-driven broad discovery (sub-problem decomposition, naming variants, alias coverage) when gemini-cli is available
Build a landscape map: sub-directions, approaches, open problems
Identify structural gaps and recurring limitations
Output a literature summary (saved to working notes)

🚦 Checkpoint: Present the landscape summary to the user. Ask:

📚 Literature survey complete. Here's what I found:
- [key findings, gaps, open problems]

Does this match your understanding? Should I adjust the scope before generating ideas?
(If no response, I'll proceed with the top-ranked direction.)

User approves (or no response + AUTO_PROCEED=true) → proceed to Phase 2 with best direction.
User requests changes (e.g., "focus more on X", "ignore Y", "too broad") → refine the search with updated queries, re-run /research-lit with adjusted scope, and present again. Repeat until the user is satisfied.

Phase 2: Idea Generation + Filtering + Pilots

Invoke /idea-creator with the landscape context (and idea-stage/REF_PAPER_SUMMARY.md if available):

/idea-creator "$ARGUMENTS" — composed: idea-stage/IDEA_REPORT.md

/idea-creator owns idea-stage/IDEA_REPORT.md as the canonical deliverable; the — composed: directive tells it to fold the survey/novelty findings in rather than emitting LIT_LANDSCAPE.md / RESEARCH_REVIEW.md / MANIFEST.md alongside.

What this does:

If idea-stage/REF_PAPER_SUMMARY.md exists, include it as context — ideas should build on, improve, or extend the reference paper
Brainstorm 8-12 concrete ideas via GPT-5.6-Sol xhigh
Filter by feasibility, compute cost, quick novelty search
Deep validate top ideas (full novelty check + devil's advocate)
Run parallel pilot experiments on available GPUs (top 2-3 ideas)
Rank by empirical signal
Output idea-stage/IDEA_REPORT.md

🚦 Checkpoint: Present idea-stage/IDEA_REPORT.md ranked ideas to the user. Ask:

💡 Generated X ideas, filtered to Y, piloted Z. Top results:

1. [Idea 1] — Pilot: POSITIVE (+X%)
2. [Idea 2] — Pilot: WEAK POSITIVE (+Y%)
3. [Idea 3] — Pilot: NEGATIVE, eliminated

Which ideas should I validate further? Or should I regenerate with different constraints?
(If no response, I'll proceed with the top-ranked ideas.)

User picks ideas (or no response + AUTO_PROCEED=true) → proceed to Phase 3 with top-ranked ideas.
User unhappy with all ideas → collect feedback ("what's missing?", "what direction do you prefer?"), update the prompt with user's constraints, and re-run Phase 2 (idea generation). Before regenerating, read the already-tried directions (research-wiki Failed Ideas + any .aris/runs/<run_id>.iterations.jsonl) and forbid a candidate too close to one already tried — enforced direction diversity; when an overnight heartbeat drives the run, record each chosen direction via iteration_log.py note ... --direction "<frame>" so later ticks can reject near-duplicates (see shared-references/external-cadence.md → Stall detection & forced structural pivot). Repeat until the user selects at least 1 idea.
User wants to adjust scope → go back to Phase 1 with refined direction.

Phase 3: Deep Novelty Verification

For each top idea (positive pilot signal), run a thorough novelty check:

/novelty-check "[top idea 1 description]"
/novelty-check "[top idea 2 description]"

What this does:

Multi-source literature search (arXiv, Scholar, Semantic Scholar)
Cross-verify with GPT-5.6-Sol xhigh
Check for concurrent work (last 3-6 months)
Identify closest existing work and differentiation points

Update idea-stage/IDEA_REPORT.md with deep novelty results. Eliminate any idea that turns out to be already published.

Phase 4: External Critical Review

For the surviving top idea(s), get brutal feedback:

/research-review "[top idea with hypothesis + pilot results]" — composed: idea-stage/IDEA_REPORT.md

In composed mode /research-review folds its conclusions into idea-stage/IDEA_REPORT.md and cites the .aris/traces/… path instead of writing a standalone review .md in the project root.

What this does:

GPT-5.6-Sol xhigh acts as a senior reviewer (NeurIPS/ICML level)
Scores the idea, identifies weaknesses, suggests minimum viable improvements
Provides concrete feedback on experimental design

Update idea-stage/IDEA_REPORT.md with reviewer feedback and revised plan.

Phase 4.5: Method Refinement + Experiment Planning

After review, refine the top idea into a concrete proposal and plan experiments:

/research-refine-pipeline "[top idea description + pilot results + reviewer feedback]"

What this does:

Freeze a Problem Anchor to prevent scope drift
Iteratively refine the method via GPT-5.6-Sol review (up to 5 rounds, until score ≥ 9)
Generate a claim-driven experiment roadmap with ablations, budgets, and run order
Output: refine-logs/FINAL_PROPOSAL.md, refine-logs/EXPERIMENT_PLAN.md, refine-logs/EXPERIMENT_TRACKER.md

🚦 Checkpoint: Present the refined proposal summary:

🔬 Method refined and experiment plan ready:
- Problem anchor: [anchored problem]
- Method thesis: [one sentence]
- Dominant contribution: [what's new]
- Must-run experiments: [N blocks]
- First 3 runs to launch: [list]

Proceed to implementation? Or adjust the proposal?

User approves (or AUTO_PROCEED=true) → proceed to Final Report.
User requests changes → pass feedback to /research-refine for another round.
Lite mode: If reviewer score < 6 or pilot was weak, run /research-refine only (skip /experiment-plan) and note remaining risks in the report.

Phase 5: Final Report

Finalize idea-stage/IDEA_REPORT.md with all accumulated information:

# Idea Discovery Report

**Direction**: $ARGUMENTS
**Date**: [today]
**Pipeline**: research-lit → idea-creator → novelty-check → research-review → research-refine-pipeline

## Executive Summary
[2-3 sentences: best idea, key evidence, recommended next step]

## Literature Landscape
[from Phase 1]

## Ranked Ideas
[from Phase 2, updated with Phase 3-4 results]

### 🏆 Idea 1: [title] — RECOMMENDED
- Pilot: POSITIVE (+X%)
- Novelty: CONFIRMED (closest: [paper], differentiation: [what's different])
- Reviewer score: X/10
- Next step: implement full experiment → /auto-review-loop

### Idea 2: [title] — BACKUP
...

## Eliminated Ideas
[ideas killed at each phase, with reasons]

## Refined Proposal
- Proposal: `refine-logs/FINAL_PROPOSAL.md`
- Experiment plan: `refine-logs/EXPERIMENT_PLAN.md`
- Tracker: `refine-logs/EXPERIMENT_TRACKER.md`

## Next Steps
- [ ] /run-experiment to deploy experiments from the plan
- [ ] /auto-review-loop to iterate until submission-ready
- [ ] Or invoke /research-pipeline for the complete end-to-end flow

Phase 5.5: Write Compact Files (when COMPACT = true)

Skip entirely if COMPACT is false.

Write idea-stage/IDEA_CANDIDATES.md — a lean summary of the top 3-5 surviving ideas:

# Idea Candidates

| # | Idea | Pilot Signal | Novelty | Reviewer Score | Status |
|---|------|-------------|---------|---------------|--------|
| 1 | [title] | +X% | Confirmed | X/10 | RECOMMENDED |
| 2 | [title] | +Y% | Confirmed | X/10 | BACKUP |
| 3 | [title] | Negative | — | — | ELIMINATED |

## Active Idea: #1 — [title]
- Hypothesis: [one sentence]
- Key evidence: [pilot result]
- Next step: /experiment-bridge or /research-refine

This file is intentionally small (~30 lines) so downstream skills and session recovery can read it without loading the full idea-stage/IDEA_REPORT.md (~200+ lines).

Phase 5.6: Instantiate the Research Contract (always — NOT gated on COMPACT)

When Phase 4 ends with a RECOMMENDED idea, create idea-stage/docs/research_contract.md from templates/RESEARCH_CONTRACT_TEMPLATE.md (resolve the template from the repo root or $ARIS_REPO/templates/), filling in: the selected idea + selection rationale, core claims, minimum convincing evidence, and the next-step pointer. Skip only when the run produced no RECOMMENDED idea.

This file is the focused working contract for the W1 → W1.5 handoff: /experiment-bridge implements against it, and /result-to-claim + /ablation-planner read it as the claims source. It is also the #2 session-recovery file (docs/SESSION_RECOVERY_GUIDE.md) — a crashed session reloads the ACTIVE idea from this contract instead of the full idea pool.

Output Protocols

Follow these shared protocols for all output files:

Output Composition Protocol — ONE canonical deliverable per pipeline; fold sub-skill findings in, don't scatter overlapping .md files

Output Versioning Protocol — write timestamped file first, then copy to fixed name

Output Manifest Protocol — maintain MANIFEST.md only above the 15-artifact threshold (not "log every output")

Output Language Protocol — respect the project's language setting

Output hygiene — ONE canonical doc, no duplicate MDs (REQUIRED)

This pipeline runs its sub-skills in composed mode (see output-composition.md): it owns a single canonical deliverable and folds every sub-skill's findings into it rather than letting each emit its own overlapping file. Concretely, for this workflow:

idea-stage/IDEA_REPORT.md is the single canonical deliverable. Sub-skills' intermediate findings (literature landscape, novelty notes, external review) are folded into it as sections/appendices — they do NOT become standalone files just because a sub-skill could emit one. If a sub-skill writes a scratch file, inline its unique content into the report and delete the scratch when the phase closes.
Pass — composed: idea-stage/IDEA_REPORT.md to every sub-skill (/research-lit, /idea-creator, /research-review) so they fold instead of scatter. This is the explicit signal; without it a sub-skill stays standalone by design.
Refined-method outputs stay in refine-logs/ (FINAL_PROPOSAL.md / EXPERIMENT_PLAN.md / EXPERIMENT_TRACKER.md). Do NOT also restate them as separate files under idea-stage/; the report links to them, it does not copy them.
No MANIFEST.md for a handful of files — only above the 15-artifact threshold in output-manifest.md.
Pilot scratch is disposable: keep the pilot script (reusable) + one results file (pilot_results.jsonl or a small summary). Delete launcher logs, smoke files, and redundant *_summary.json once the numbers are in the report.
Cross-model review traces belong in .aris/traces/… (the audit trail); do not ALSO keep a human-facing copy under idea-stage/ — cite the trace path from the report.
Before finishing, the idea-stage/ top level should be roughly: IDEA_REPORT.md (+ .html), the pilot script + results, and the refine-logs/ dir. Nothing else unless it carries content not in the report.

Render HTML view (auto, when `RENDER_HTML = true`)

After Phase 4 finalizes idea-stage/IDEA_REPORT.md (and the optional IDEA_CANDIDATES.md), invoke /render-html on the report so the user has a single-file HTML view for tablet / phone reading:

/render-html "idea-stage/IDEA_REPORT.md" --no-review

--no-review is intentional: source MD already passed this skill's own novelty + cross-model review. HTML render is a structural conversion, not a new claim-audit gate. Output lands at idea-stage/IDEA_REPORT.html with embedded source SHA256 + render timestamp.

Non-blocking: if /render-html fails (helper missing, Codex MCP unavailable, file write error), log the failure and continue — the HTML view is a convenience artifact, not a Phase 4 prerequisite.

Skip this step if RENDER_HTML = false.

Key Rules

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Don't skip phases. Each phase filters and validates — skipping leads to wasted effort later.
Checkpoint between phases. Briefly summarize what was found before moving on.
Kill ideas early. It's better to kill 10 bad ideas in Phase 3 than to implement one and fail.
Empirical signal > theoretical appeal. An idea with a positive pilot outranks a "sounds great" idea without evidence.
Document everything — inside the one report, not in scattered files. Dead ends and eliminated ideas are valuable, so record them as sections of idea-stage/IDEA_REPORT.md (see Output hygiene above). Do not spawn a separate .md per phase.
Be honest with the reviewer. Include negative results and failed pilots in the review prompt.
Feishu notifications are optional. If ~/.claude/feishu.json exists, send checkpoint at each phase transition and pipeline_done at final report. If absent/off, skip silently.

Composing with Workflow 2

After this pipeline produces a validated top idea:

/idea-discovery "direction"         ← you are here (Workflow 1, includes method refinement + experiment planning)
/run-experiment                     ← deploy experiments from the plan
/auto-review-loop "top idea"        ← Workflow 2: iterate until submission-ready

Or use /research-pipeline for the full end-to-end flow.

Workflow 1: Idea Discovery Pipeline

Orchestrate a complete idea discovery workflow for: $ARGUMENTS

Overview

This skill chains sub-skills into a single automated pipeline:

/research-lit → /idea-creator → /novelty-check → /research-review → /research-refine-pipeline
  (survey)      (brainstorm)    (verify novel)    (critical feedback)  (refine method + plan experiments)

Constants

PILOT_MAX_HOURS = 2 — Skip any pilot experiment estimated to take > 2 hours per GPU. Flag as "needs manual pilot" in the report.
PILOT_TIMEOUT_HOURS = 3 — Hard timeout: kill any running pilot that exceeds 3 hours. Collect partial results if available.
MAX_PILOT_IDEAS = 3 — Run pilots for at most 3 top ideas in parallel. Additional ideas are validated on paper only.
MAX_TOTAL_GPU_HOURS = 8 — Total GPU budget across all pilots. If exceeded, skip remaining pilots and note in report.
AUTO_PROCEED = true — If user doesn't respond at a checkpoint, automatically proceed with the best option after presenting results. Set to false to always wait for explicit user confirmation.
REVIEWER_MODEL = gpt-5.6-sol — Model used via Codex MCP. Must be an OpenAI model (e.g., gpt-5.6-sol, o3, gpt-4o). Passed to sub-skills.
OUTPUT_DIR = idea-stage/ — All idea-stage outputs go here. Create the directory if it doesn't exist.
ARXIV_DOWNLOAD = false — When true, /research-lit downloads the top relevant arXiv PDFs during Phase 1. When false (default), only fetches metadata. Passed through to /research-lit.
COMPACT = false — When true, generate compact summary files for short-context models and session recovery. Writes idea-stage/IDEA_CANDIDATES.md (top 3-5 ideas only) at the end of this workflow. Downstream skills read this instead of the full idea-stage/IDEA_REPORT.md.
RENDER_HTML = true — When true (default), auto-render idea-stage/IDEA_REPORT.md to HTML at workflow end via /render-html. Uses --no-review (the source MD already went through novelty + cross-model review during Phase 3). Set false to skip, or pass — render html: false.
REF_PAPER = false — Reference paper to base ideas on. Accepts: local PDF path, arXiv URL, or any paper URL. When set, the paper is summarized first (idea-stage/REF_PAPER_SUMMARY.md), then idea generation uses it as context. Combine with base repo for "improve this paper with this codebase" workflows.

💡 These are defaults. Override by telling the skill, e.g., /idea-discovery "topic" — ref paper: https://arxiv.org/abs/2406.04329 or /idea-discovery "topic" — compact: true.

Pipeline

Phase 0: Load Research Brief (if available)

Before starting any other phase, check for a detailed research brief in the project:

Look for RESEARCH_BRIEF.md in the project root (or path passed as $ARGUMENTS)
If found, read it and extract:
- Problem statement and context
- Constraints (compute, data, timeline, venue)
- What the user already tried / what didn't work
- Domain knowledge and non-goals
- Existing results (if any)
Use this as the primary context for all subsequent phases — it replaces the one-line prompt
If both RESEARCH_BRIEF.md and a one-line $ARGUMENTS exist, merge them (brief takes priority for details, argument sets the direction)

If no brief exists, proceed normally with $ARGUMENTS as the research direction.

💡 Create a brief from the template: cp templates/RESEARCH_BRIEF_TEMPLATE.md RESEARCH_BRIEF.md — keep it to ~1-2 pages (4-8k chars); long material goes in separate files referenced by path.

Phase 0.5: Reference Paper Summary (when REF_PAPER is set)

Skip entirely if REF_PAPER is false.

Summarize the reference paper before searching the literature:

If arXiv URL (e.g., https://arxiv.org/abs/2406.04329):
- Invoke /arxiv "ARXIV_ID" — download to fetch the PDF
- Read the first 5 pages (title, abstract, intro, method overview)
If local PDF path (e.g., papers/reference.pdf):
- Read the PDF directly (first 5 pages)
If other URL:
- Fetch and extract content via WebFetch
Generate idea-stage/REF_PAPER_SUMMARY.md:

# Reference Paper Summary

**Title**: [paper title]
**Authors**: [authors]
**Venue**: [venue, year]

## What They Did
[2-3 sentences: core method and contribution]

## Key Results
[Main quantitative findings]

## Limitations & Open Questions
[What the paper didn't solve, acknowledged weaknesses, future work suggestions]

## Potential Improvement Directions
[Based on the limitations, what could be improved or extended?]

## Codebase
[If `base repo` is also set: link to the repo and note which parts correspond to the paper]

🚦 Checkpoint: Present the summary to the user:

📄 Reference paper summarized:
- Title: [title]
- Key limitation: [main gap]
- Improvement directions: [2-3 bullets]

Proceeding to literature survey with this as context.

Phase 1: Literature Survey

# If $ARGUMENTS already contains "— sources:", pass through unchanged
# (the user is in control of source selection):
/research-lit "$ARGUMENTS" — composed: idea-stage/IDEA_REPORT.md

# Otherwise (the common case), include gemini explicitly for broader discovery:
/research-lit "$ARGUMENTS" — sources: all, gemini — composed: idea-stage/IDEA_REPORT.md

What this does:

Search arXiv, Google Scholar, Semantic Scholar for recent papers
Plus Gemini-driven broad discovery (sub-problem decomposition, naming variants, alias coverage) when gemini-cli is available
Build a landscape map: sub-directions, approaches, open problems
Identify structural gaps and recurring limitations
Output a literature summary (saved to working notes)

🚦 Checkpoint: Present the landscape summary to the user. Ask:

📚 Literature survey complete. Here's what I found:
- [key findings, gaps, open problems]

Does this match your understanding? Should I adjust the scope before generating ideas?
(If no response, I'll proceed with the top-ranked direction.)

User approves (or no response + AUTO_PROCEED=true) → proceed to Phase 2 with best direction.
User requests changes (e.g., "focus more on X", "ignore Y", "too broad") → refine the search with updated queries, re-run /research-lit with adjusted scope, and present again. Repeat until the user is satisfied.

Phase 2: Idea Generation + Filtering + Pilots

Invoke /idea-creator with the landscape context (and idea-stage/REF_PAPER_SUMMARY.md if available):

/idea-creator "$ARGUMENTS" — composed: idea-stage/IDEA_REPORT.md

What this does:

If idea-stage/REF_PAPER_SUMMARY.md exists, include it as context — ideas should build on, improve, or extend the reference paper
Brainstorm 8-12 concrete ideas via GPT-5.6-Sol xhigh
Filter by feasibility, compute cost, quick novelty search
Deep validate top ideas (full novelty check + devil's advocate)
Run parallel pilot experiments on available GPUs (top 2-3 ideas)
Rank by empirical signal
Output idea-stage/IDEA_REPORT.md

🚦 Checkpoint: Present idea-stage/IDEA_REPORT.md ranked ideas to the user. Ask:

💡 Generated X ideas, filtered to Y, piloted Z. Top results:

1. [Idea 1] — Pilot: POSITIVE (+X%)
2. [Idea 2] — Pilot: WEAK POSITIVE (+Y%)
3. [Idea 3] — Pilot: NEGATIVE, eliminated

Which ideas should I validate further? Or should I regenerate with different constraints?
(If no response, I'll proceed with the top-ranked ideas.)

User picks ideas (or no response + AUTO_PROCEED=true) → proceed to Phase 3 with top-ranked ideas.
User unhappy with all ideas → collect feedback ("what's missing?", "what direction do you prefer?"), update the prompt with user's constraints, and re-run Phase 2 (idea generation). Before regenerating, read the already-tried directions (research-wiki Failed Ideas + any .aris/runs/<run_id>.iterations.jsonl) and forbid a candidate too close to one already tried — enforced direction diversity; when an overnight heartbeat drives the run, record each chosen direction via iteration_log.py note ... --direction "<frame>" so later ticks can reject near-duplicates (see shared-references/external-cadence.md → Stall detection & forced structural pivot). Repeat until the user selects at least 1 idea.
User wants to adjust scope → go back to Phase 1 with refined direction.

Phase 3: Deep Novelty Verification

For each top idea (positive pilot signal), run a thorough novelty check:

/novelty-check "[top idea 1 description]"
/novelty-check "[top idea 2 description]"

What this does:

Multi-source literature search (arXiv, Scholar, Semantic Scholar)
Cross-verify with GPT-5.6-Sol xhigh
Check for concurrent work (last 3-6 months)
Identify closest existing work and differentiation points

Update idea-stage/IDEA_REPORT.md with deep novelty results. Eliminate any idea that turns out to be already published.

Phase 4: External Critical Review

For the surviving top idea(s), get brutal feedback:

/research-review "[top idea with hypothesis + pilot results]" — composed: idea-stage/IDEA_REPORT.md

In composed mode /research-review folds its conclusions into idea-stage/IDEA_REPORT.md and cites the .aris/traces/… path instead of writing a standalone review .md in the project root.

What this does:

GPT-5.6-Sol xhigh acts as a senior reviewer (NeurIPS/ICML level)
Scores the idea, identifies weaknesses, suggests minimum viable improvements
Provides concrete feedback on experimental design

Update idea-stage/IDEA_REPORT.md with reviewer feedback and revised plan.

Phase 4.5: Method Refinement + Experiment Planning

After review, refine the top idea into a concrete proposal and plan experiments:

/research-refine-pipeline "[top idea description + pilot results + reviewer feedback]"

What this does:

Freeze a Problem Anchor to prevent scope drift
Iteratively refine the method via GPT-5.6-Sol review (up to 5 rounds, until score ≥ 9)
Generate a claim-driven experiment roadmap with ablations, budgets, and run order
Output: refine-logs/FINAL_PROPOSAL.md, refine-logs/EXPERIMENT_PLAN.md, refine-logs/EXPERIMENT_TRACKER.md

🚦 Checkpoint: Present the refined proposal summary:

🔬 Method refined and experiment plan ready:
- Problem anchor: [anchored problem]
- Method thesis: [one sentence]
- Dominant contribution: [what's new]
- Must-run experiments: [N blocks]
- First 3 runs to launch: [list]

Proceed to implementation? Or adjust the proposal?

User approves (or AUTO_PROCEED=true) → proceed to Final Report.
User requests changes → pass feedback to /research-refine for another round.
Lite mode: If reviewer score < 6 or pilot was weak, run /research-refine only (skip /experiment-plan) and note remaining risks in the report.

Phase 5: Final Report

Finalize idea-stage/IDEA_REPORT.md with all accumulated information:

# Idea Discovery Report

**Direction**: $ARGUMENTS
**Date**: [today]
**Pipeline**: research-lit → idea-creator → novelty-check → research-review → research-refine-pipeline

## Executive Summary
[2-3 sentences: best idea, key evidence, recommended next step]

## Literature Landscape
[from Phase 1]

## Ranked Ideas
[from Phase 2, updated with Phase 3-4 results]

### 🏆 Idea 1: [title] — RECOMMENDED
- Pilot: POSITIVE (+X%)
- Novelty: CONFIRMED (closest: [paper], differentiation: [what's different])
- Reviewer score: X/10
- Next step: implement full experiment → /auto-review-loop

### Idea 2: [title] — BACKUP
...

## Eliminated Ideas
[ideas killed at each phase, with reasons]

## Refined Proposal
- Proposal: `refine-logs/FINAL_PROPOSAL.md`
- Experiment plan: `refine-logs/EXPERIMENT_PLAN.md`
- Tracker: `refine-logs/EXPERIMENT_TRACKER.md`

## Next Steps
- [ ] /run-experiment to deploy experiments from the plan
- [ ] /auto-review-loop to iterate until submission-ready
- [ ] Or invoke /research-pipeline for the complete end-to-end flow

Phase 5.5: Write Compact Files (when COMPACT = true)

Skip entirely if COMPACT is false.

Write idea-stage/IDEA_CANDIDATES.md — a lean summary of the top 3-5 surviving ideas:

# Idea Candidates

| # | Idea | Pilot Signal | Novelty | Reviewer Score | Status |
|---|------|-------------|---------|---------------|--------|
| 1 | [title] | +X% | Confirmed | X/10 | RECOMMENDED |
| 2 | [title] | +Y% | Confirmed | X/10 | BACKUP |
| 3 | [title] | Negative | — | — | ELIMINATED |

## Active Idea: #1 — [title]
- Hypothesis: [one sentence]
- Key evidence: [pilot result]
- Next step: /experiment-bridge or /research-refine

This file is intentionally small (~30 lines) so downstream skills and session recovery can read it without loading the full idea-stage/IDEA_REPORT.md (~200+ lines).

Phase 5.6: Instantiate the Research Contract (always — NOT gated on COMPACT)

Output Protocols

Follow these shared protocols for all output files:

Output Composition Protocol — ONE canonical deliverable per pipeline; fold sub-skill findings in, don't scatter overlapping .md files

Output Versioning Protocol — write timestamped file first, then copy to fixed name

Output Manifest Protocol — maintain MANIFEST.md only above the 15-artifact threshold (not "log every output")

Output Language Protocol — respect the project's language setting

Output hygiene — ONE canonical doc, no duplicate MDs (REQUIRED)

idea-stage/IDEA_REPORT.md is the single canonical deliverable. Sub-skills' intermediate findings (literature landscape, novelty notes, external review) are folded into it as sections/appendices — they do NOT become standalone files just because a sub-skill could emit one. If a sub-skill writes a scratch file, inline its unique content into the report and delete the scratch when the phase closes.
Pass — composed: idea-stage/IDEA_REPORT.md to every sub-skill (/research-lit, /idea-creator, /research-review) so they fold instead of scatter. This is the explicit signal; without it a sub-skill stays standalone by design.
Refined-method outputs stay in refine-logs/ (FINAL_PROPOSAL.md / EXPERIMENT_PLAN.md / EXPERIMENT_TRACKER.md). Do NOT also restate them as separate files under idea-stage/; the report links to them, it does not copy them.
No MANIFEST.md for a handful of files — only above the 15-artifact threshold in output-manifest.md.
Pilot scratch is disposable: keep the pilot script (reusable) + one results file (pilot_results.jsonl or a small summary). Delete launcher logs, smoke files, and redundant *_summary.json once the numbers are in the report.
Cross-model review traces belong in .aris/traces/… (the audit trail); do not ALSO keep a human-facing copy under idea-stage/ — cite the trace path from the report.
Before finishing, the idea-stage/ top level should be roughly: IDEA_REPORT.md (+ .html), the pilot script + results, and the refine-logs/ dir. Nothing else unless it carries content not in the report.

Render HTML view (auto, when `RENDER_HTML = true`)

After Phase 4 finalizes idea-stage/IDEA_REPORT.md (and the optional IDEA_CANDIDATES.md), invoke /render-html on the report so the user has a single-file HTML view for tablet / phone reading:

/render-html "idea-stage/IDEA_REPORT.md" --no-review

Skip this step if RENDER_HTML = false.

Key Rules

Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Don't skip phases. Each phase filters and validates — skipping leads to wasted effort later.
Checkpoint between phases. Briefly summarize what was found before moving on.
Kill ideas early. It's better to kill 10 bad ideas in Phase 3 than to implement one and fail.
Empirical signal > theoretical appeal. An idea with a positive pilot outranks a "sounds great" idea without evidence.
Document everything — inside the one report, not in scattered files. Dead ends and eliminated ideas are valuable, so record them as sections of idea-stage/IDEA_REPORT.md (see Output hygiene above). Do not spawn a separate .md per phase.
Be honest with the reviewer. Include negative results and failed pilots in the review prompt.
Feishu notifications are optional. If ~/.claude/feishu.json exists, send checkpoint at each phase transition and pipeline_done at final report. If absent/off, skip silently.

Composing with Workflow 2

After this pipeline produces a validated top idea:

/idea-discovery "direction"         ← you are here (Workflow 1, includes method refinement + experiment planning)
/run-experiment                     ← deploy experiments from the plan
/auto-review-loop "top idea"        ← Workflow 2: iterate until submission-ready

Or use /research-pipeline for the full end-to-end flow.

Adoption

wanshuiyin/idea-discovery

$ install --global

Security Scan Results

SKILL.md

Workflow 1: Idea Discovery Pipeline

Overview

Constants

Pipeline

Phase 0: Load Research Brief (if available)

Phase 0.5: Reference Paper Summary (when REF_PAPER is set)

Phase 1: Literature Survey

Phase 2: Idea Generation + Filtering + Pilots

Phase 3: Deep Novelty Verification

Phase 4: External Critical Review

Phase 4.5: Method Refinement + Experiment Planning

Phase 5: Final Report

Phase 5.5: Write Compact Files (when COMPACT = true)

Phase 5.6: Instantiate the Research Contract (always — NOT gated on COMPACT)

Output Protocols

Output hygiene — ONE canonical doc, no duplicate MDs (REQUIRED)

Render HTML view (auto, when RENDER_HTML = true)

Key Rules

Composing with Workflow 2

Related Skills

wanshuiyin/web-debug-search

wanshuiyin/web-debug-search

wanshuiyin/integrity-forensics

wanshuiyin/meta-apply

wanshuiyin/idea-discovery

$ install --global

Security Scan Results

SKILL.md

Workflow 1: Idea Discovery Pipeline

Overview

Constants

Pipeline

Phase 0: Load Research Brief (if available)

Phase 0.5: Reference Paper Summary (when REF_PAPER is set)

Phase 1: Literature Survey

Phase 2: Idea Generation + Filtering + Pilots

Phase 3: Deep Novelty Verification

Phase 4: External Critical Review

Phase 4.5: Method Refinement + Experiment Planning

Phase 5: Final Report

Phase 5.5: Write Compact Files (when COMPACT = true)

Phase 5.6: Instantiate the Research Contract (always — NOT gated on COMPACT)

Output Protocols

Output hygiene — ONE canonical doc, no duplicate MDs (REQUIRED)

Render HTML view (auto, when RENDER_HTML = true)

Key Rules

Composing with Workflow 2

Related Skills

wanshuiyin/web-debug-search

wanshuiyin/web-debug-search

wanshuiyin/integrity-forensics

wanshuiyin/meta-apply

Render HTML view (auto, when `RENDER_HTML = true`)

Render HTML view (auto, when `RENDER_HTML = true`)