Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

queelius/experiment

Name: experiment
Author: queelius

papermill/skills/experiment/SKILL.md

npx skillsauth add queelius/claude-anvil experiment

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Experiment Design

Help the researcher design rigorous experiments or computational studies. Good experiments are hypothesis-driven, reproducible, and have clear success criteria before they are run.

Step 1: Read Context

Read .papermill/state.md (Read tool) for:

Thesis: The claim the experiments should support or test.
Existing experiments: Any previously registered experiments.
Format and tools: What languages/tools are in the repo (R, Python, C++, etc.).

If .papermill/state.md does not exist, ask the user what claim the experiments should test. Experiment design can proceed without the state file — suggest running /papermill:init afterward to register experiments persistently.

Scan the repository for existing code (Glob tool) in research/, code/, scripts/, experiments/, or analysis/ directories.

Step 2: Identify What Needs Testing

Ask the user: "What specific claim or aspect of your thesis do these experiments need to support?"

Different contribution types need different experimental approaches:

| Contribution | Experimental approach | |-------------|----------------------| | Theorem/proof | Numerical validation of theoretical predictions | | Algorithm | Runtime/accuracy benchmarks against baselines | | Statistical method | Monte Carlo simulations with known ground truth | | Empirical finding | Controlled experiments with statistical tests | | Framework/model | Case studies demonstrating applicability |

Step 3: Design the Experiment

For each experiment, specify:

Hypothesis

State the expected outcome in falsifiable terms. "We expect X to be Y under conditions Z" -- not "we want to show our method works."

Variables

Independent variables: What you manipulate (parameters, dataset size, algorithm choice).
Dependent variables: What you measure (accuracy, runtime, error rate).
Control variables: What you hold constant (hardware, random seeds, data preprocessing).

Methodology

Data generation or collection procedure
Algorithm/method configuration
Number of replications or samples
Statistical tests to apply (t-test, bootstrap CI, etc.)
Baseline comparisons

Success Criteria

Define before running what constitutes support for the hypothesis. This prevents post-hoc rationalization.

Reproducibility

Random seed strategy
Hardware/software environment
Data availability
Script that runs the full experiment end-to-end

Step 4: Address Common Pitfalls

Check for and warn about:

Cherry-picking: Are you testing one configuration or sweeping parameters fairly?
Multiple comparisons: If running many tests, apply correction (Bonferroni, FDR).
Overfitting to test data: Is there a held-out validation set?
Computational budget: Is this feasible given available hardware and time?
Missing baselines: Every method needs comparison to something. Even "no method" is a baseline.

Step 5: Register the Experiment

If .papermill/state.md exists, update it (Edit tool) by adding to the experiments list. If it does not exist, skip registration and suggest running /papermill:init to persist the experiment.

experiments:
  - name: "descriptive-name"
    type: "simulation | benchmark | case-study | ablation"
    hypothesis: "Expected outcome in one sentence"
    status: "planned | running | completed | failed"
    script: "path/to/script.R"
    last_run: null

Append a timestamped note documenting the experiment design.

Step 6: Suggest Next Steps

Based on the experiment type, suggest the most relevant next step:

If this is a Monte Carlo study → "Use /papermill:simulation for detailed simulation design — it covers sample sizes, convergence diagnostics, and result presentation."
If the experiment involves proving a theoretical prediction → "Consider /papermill:proof to verify the theory before running experiments."
If results will need statistical analysis → "Implement the script, run it, then use /papermill:review once the results are written up."
For all experiments → "Start with a small pilot run to debug before the full experiment."

queelius/experiment

papermill/skills/experiment/SKILL.md

Design rigorous experiments for a research paper: hypothesis formulation, variable identification, methodology selection, and success criteria. Produces a structured experiment plan with reproducibility in mind.

1 stars

development

Updated Apr 22, 2026

$ install --global

skillsauth

npx skillsauth add queelius/claude-anvil experiment

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 22, 2026, 3:09 PM17.8s1 file scanned

SKILL.md

name:: experiment
description:: >-
Design rigorous experiments for a research paper:: hypothesis formulation,

Experiment Design

Help the researcher design rigorous experiments or computational studies. Good experiments are hypothesis-driven, reproducible, and have clear success criteria before they are run.

Step 1: Read Context

Read .papermill/state.md (Read tool) for:

Thesis: The claim the experiments should support or test.
Existing experiments: Any previously registered experiments.
Format and tools: What languages/tools are in the repo (R, Python, C++, etc.).

Scan the repository for existing code (Glob tool) in research/, code/, scripts/, experiments/, or analysis/ directories.

Step 2: Identify What Needs Testing

Ask the user: "What specific claim or aspect of your thesis do these experiments need to support?"

Different contribution types need different experimental approaches:

Step 3: Design the Experiment

For each experiment, specify:

Hypothesis

State the expected outcome in falsifiable terms. "We expect X to be Y under conditions Z" -- not "we want to show our method works."

Variables

Independent variables: What you manipulate (parameters, dataset size, algorithm choice).
Dependent variables: What you measure (accuracy, runtime, error rate).
Control variables: What you hold constant (hardware, random seeds, data preprocessing).

Methodology

Data generation or collection procedure
Algorithm/method configuration
Number of replications or samples
Statistical tests to apply (t-test, bootstrap CI, etc.)
Baseline comparisons

Success Criteria

Define before running what constitutes support for the hypothesis. This prevents post-hoc rationalization.

Reproducibility

Random seed strategy
Hardware/software environment
Data availability
Script that runs the full experiment end-to-end

Step 4: Address Common Pitfalls

Check for and warn about:

Cherry-picking: Are you testing one configuration or sweeping parameters fairly?
Multiple comparisons: If running many tests, apply correction (Bonferroni, FDR).
Overfitting to test data: Is there a held-out validation set?
Computational budget: Is this feasible given available hardware and time?
Missing baselines: Every method needs comparison to something. Even "no method" is a baseline.

Step 5: Register the Experiment

If .papermill/state.md exists, update it (Edit tool) by adding to the experiments list. If it does not exist, skip registration and suggest running /papermill:init to persist the experiment.

experiments:
  - name: "descriptive-name"
    type: "simulation | benchmark | case-study | ablation"
    hypothesis: "Expected outcome in one sentence"
    status: "planned | running | completed | failed"
    script: "path/to/script.R"
    last_run: null

Append a timestamped note documenting the experiment design.

Step 6: Suggest Next Steps

Based on the experiment type, suggest the most relevant next step:

If this is a Monte Carlo study → "Use /papermill:simulation for detailed simulation design — it covers sample sizes, convergence diagnostics, and result presentation."
If the experiment involves proving a theoretical prediction → "Consider /papermill:proof to verify the theory before running experiments."
If results will need statistical analysis → "Implement the script, run it, then use /papermill:review once the results are written up."
For all experiments → "Start with a small pilot run to debug before the full experiment."

Related Skills

queelius/synthesize

development

VerifiedTrustedCommunity

Force a research-agent run to conclude. Launches the researcher in synthesis mode: it reads state.md and log.md, writes .research/synthesis.md with outcome, key findings, failed approaches, open questions, and recommendations, then exits. Use when current results are good enough or the agent is stalling.

1SKILL.mdUpdated May 6, 2026

queelius/status

data-ai

VerifiedTrustedCommunity

Show the current state of an in-flight research-agent run from .research/state.md, log.md, and attempts/. Read-only summary of cycles, sub-problems, hypothesis statuses, eval trend, and current focus.

1SKILL.mdUpdated May 6, 2026

queelius/resume

testing

VerifiedTrustedCommunity

Resume an interrupted research-agent run. Re-launches the researcher with instructions to read .research/state.md and log.md, reorient, and continue from the documented current focus. Use after a context compression, session restart, or explicit pause.

1SKILL.mdUpdated May 6, 2026

queelius/workflows

tools

VerifiedTrustedCommunity

When and how to use the repoindex plugin surface (MCP tools, agents, slash commands) for collection queries, release prep, activity summaries, and tag discipline. Use when users ask repoindex questions, mention their repo catalog, or want to know which repoindex tool fits their task.

1SKILL.mdUpdated Apr 25, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/queelius/claude-anvil.git

# Copy into Claude Code skills folder (global)
cp -r claude-anvil/papermill/skills/experiment ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

queelius/claude-anvil

1 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT