skills/42-wanshuiyin-ARIS/skills/skills-codex/experiment-bridge/SKILL.md
Workflow 1.5: Bridge between idea discovery and auto review. Reads EXPERIMENT_PLAN.md, implements experiment code, deploys to GPU, collects initial results. Use when user says "实现实验", "implement experiments", "bridge", "从计划到跑实验", "deploy the plan", or has an experiment plan ready to execute.
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research experiment-bridgeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Implement and deploy experiments from plan: $ARGUMENTS
This skill bridges Workflow 1 (idea discovery + method refinement) and Workflow 2 (auto review loop). It takes the experiment plan and turns it into running experiments with initial results.
Workflow 1 output: This skill: Workflow 2 input:
refine-logs/EXPERIMENT_PLAN.md → implement → deploy → collect → initial results ready
refine-logs/EXPERIMENT_TRACKER.md code /run-experiment for /auto-review-loop
refine-logs/FINAL_PROPOSAL.md
false to review code before deploying.true, prefer IDEA_CANDIDATES.md over the full IDEA_REPORT.md, and append completed runs to EXPERIMENT_LOG.md.Override:
/experiment-bridge "EXPERIMENT_PLAN.md" — compact: true, base repo: https://github.com/org/project
This skill expects one or more of:
refine-logs/EXPERIMENT_PLAN.md (best) — claim-driven experiment roadmap from /experiment-planrefine-logs/EXPERIMENT_TRACKER.md — run-by-run execution tablerefine-logs/FINAL_PROPOSAL.md — method description for implementation contextIDEA_CANDIDATES.md — compact idea summary (preferred when COMPACT = true)IDEA_REPORT.md — fallback if refine-logs don't existIf none exist, ask the user what experiments to implement.
Read EXPERIMENT_PLAN.md and extract:
FINAL_PROPOSAL.md — what exactly to implementPresent a brief summary:
📋 Experiment plan loaded:
- Milestones: [N] (sanity → baseline → main → ablation)
- Must-run experiments: [N]
- Nice-to-have: [N]
- Estimated GPU-hours: [X]
Proceeding to implementation.
If BASE_REPO is set — clone the repo first:
git clone <BASE_REPO> base_repo/
For each milestone (in order), write the experiment scripts:
Check existing code — scan the project (or cloned base_repo/) for existing experiment scripts, model code, and data loaders. Reuse as much as possible.
Implement missing pieces:
Follow the plan's run order — implement sanity-stage experiments first, then baselines, then main method, then ablations.
Self-review before deploying:
Before deploying the full experiment suite, run the sanity-stage experiment:
/run-experiment [sanity experiment command]
Wait for completion. Verify:
If sanity fails → fix the code, re-run. Do not proceed to full deployment with broken code.
Deploy experiments following the plan's milestone order:
/run-experiment [experiment commands]
For each milestone:
/monitor-experiment to track progress🚦 Checkpoint (if AUTO_DEPLOY = false):
🔧 Code implementation complete. Ready to deploy:
Milestone 0 (sanity): [status — passed/pending]
Milestone 1 (baseline): [N experiments, ~X GPU-hours]
Milestone 2 (main method): [N experiments, ~X GPU-hours]
Milestone 3 (ablations): [N experiments, ~X GPU-hours]
Total estimated: ~X GPU-hours on [N] GPUs
Deploy now? Or review the code first?
As experiments complete:
/training-check to detect NaN, loss divergence, plateaus, or overfitting. If W&B is not configured, skip silently.refine-logs/EXPERIMENT_TRACKER.md — fill in Status and Notes columns# Initial Experiment Results
**Date**: [today]
**Plan**: refine-logs/EXPERIMENT_PLAN.md
## Results by Milestone
### M0: Sanity — PASSED
- [result]
### M1: Baselines
| Run | System | Key Metric | Status |
|-----|--------|-----------|--------|
| R001 | baseline_1 | X.XX | DONE |
### M2: Main Method
| Run | System | Key Metric | Status |
|-----|--------|-----------|--------|
| R003 | our_method | X.XX | DONE |
### M3: Ablations
...
## Summary
- [X/Y] must-run experiments completed
- Main result: [positive/negative/inconclusive]
- Ready for /auto-review-loop: [YES/NO]
## Next Step
→ /auto-review-loop "[topic]"
Skip entirely if COMPACT is false.
Append each completed experiment to EXPERIMENT_LOG.md:
## [Run ID] — [timestamp]
- **System**: [method name]
- **Config**: [key hyperparameters]
- **Result**: [primary metric = X.XX]
- **Verdict**: [positive / negative / inconclusive]
- **Reproduce**: `python train.py --config configs/run_id.yaml --seed 42`
After main experiments (M2) complete with positive results, invoke /ablation-planner to design ablation studies:
refine-logs/EXPERIMENT_PLAN.md and refine-logs/EXPERIMENT_TRACKER.mdIf /ablation-planner is unavailable, skip silently.
Present final status:
🔬 Experiment bridge complete:
- Implemented: [N] experiment scripts
- Deployed: [N] experiments on [M] GPUs
- Completed: [X/Y] must-run, [A/B] nice-to-have
- Main result: [one sentence]
Results: refine-logs/EXPERIMENT_RESULTS.md
Tracker: refine-logs/EXPERIMENT_TRACKER.md
Ready for Workflow 2:
→ /auto-review-loop "[topic]"
cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.EXPERIMENT_TRACKER.md should reflect real status after each run completes./idea-discovery "direction" ← Workflow 1: find + refine + plan
/experiment-bridge ← you are here (Workflow 1.5: implement + deploy)
/auto-review-loop "topic" ← Workflow 2: review + iterate
/paper-writing "NARRATIVE_REPORT.md" ← Workflow 3: write the paper
Or use /research-pipeline for the full end-to-end flow (includes this bridge).
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.