Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

moralespanitz/loop

Name: loop
Author: moralespanitz

internal/embed/claude/skills/loop/SKILL.md

npx skillsauth add moralespanitz/research-loop loop

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

<SUBAGENT-STOP> If you were dispatched as a subagent to run a specific experiment or annotate a result, skip this skill. Execute the task and return structured results immediately. </SUBAGENT-STOP> <HARD-GATE> Design ALL conditions upfront. Rank them. Run ONE at a time. Never propose the next experiment before logging what you learned from the last one. </HARD-GATE>

Loop Skill

You are running a scientific iteration loop. Each experiment is a question. The answer shapes the next question. The ledger accumulates evidence.

Phase 1 — Design all conditions, rank by information value

Ask:

"What's the core hypothesis we're testing? One sentence."

Then list ALL planned conditions — you already know them from the discovery phase. Show them ranked by what teaches you the most first:

HYPOTHESIS: [one sentence]
FALSIFICATION: [what result kills it]

CONDITIONS (ranked by information value):
  1. [highest priority] — tests [what], teaches [what] if it fails/passes
  2. [second] — only run this if condition 1 passes/fails in [way]
  3. [third] — ...
  N. [memory transplant / killer test] — run last, most definitive

RANKING LOGIC: [one sentence explaining why this order]

Ask:

"Does this ranking make sense? Any condition you want to move up or down?"

Wait. Adjust if needed. Then create TodoWrite tasks — one per condition, in ranked order.

Write hypothesis.md:

# Hypothesis
[one sentence]

# Prediction
[what you expect to observe if hypothesis is correct]

# Falsification
[what result proves it wrong]

# Conditions (ranked)
1. [name]: [what changes] — Priority: [why first]
2. [name]: [what changes] — Priority: [why second]
...

Phase 2 — Run one experiment at a time

For each condition, in ranked order:

BEFORE running — state the question:

"Condition [N]: [name]. The question this answers: [one sentence]. Expected result if hypothesis holds: [specific]. Expected result if hypothesis fails: [specific]. Running now."

AFTER running — ask for the insight:

"Result: [metric]. Expected? What does this tell you — one sentence."

Wait for their answer. Then add your own causal read:

"My read: [mechanistic explanation]. This [confirms / challenges / is neutral toward] the hypothesis because [why]."

Log the insight immediately — append to insights.md:

## Insight [N] — [date]
Condition: [name]
Result: [metric]
Researcher interpretation: [their words]
Causal annotation: [mechanistic explanation]
Hypothesis status: [strengthened / weakened / unchanged / killed]
Next question this raises: [what you now want to know]

Mark the TodoWrite task complete. Then ask:

"Given this result, does the ranking still make sense? Or do you want to reprioritize?"

Only then propose the next condition.

Phase 3 — After each insight, update the ledger

After every experiment, run:

# Append to insights ledger
echo "---" >> .research-loop/sessions/<slug>/insights.md

Then update knowledge_graph.md — change the condition status from pending to done: [result].

Phase 4 — Kill or continue decision

After each run:

Does the result change what we think?
├── CONFIRMS hypothesis → continue ranked list
├── WEAKENS hypothesis → reprioritize — move falsification test up
├── KILLS hypothesis → stop, log finding, load execution skill
└── SURPRISING (neither confirms nor kills) → this is the most interesting result
    → pause, ask "what does this mean?", update hypothesis if needed

Surprising results are never failures. They are the most valuable signal.

Phase 5 — When all conditions are run

Before looking at the full picture, ask:

"Write the conclusion in one sentence — what does the evidence say about the hypothesis?"

Then show the full ledger. Compare their conclusion to the original prediction. If they differ — that gap is often the real finding.

Load execution skill to formalize the decision: continue, pivot, or write the paper.

moralespanitz/loop

internal/embed/claude/skills/loop/SKILL.md

Use when user has a hypothesis and a repo and wants to run experiments. Triggered by "start experiments", "run the loop", "test this hypothesis".

4 stars

testing

Updated Apr 20, 2026

$ install --global

skillsauth

npx skillsauth add moralespanitz/research-loop loop

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:03 PM14.1s1 file scanned

SKILL.md

name:: loop
description:: Use when user has a hypothesis and a repo and wants to run experiments. Triggered by "start experiments", "run the loop", "test this hypothesis".

Loop Skill

You are running a scientific iteration loop. Each experiment is a question. The answer shapes the next question. The ledger accumulates evidence.

Phase 1 — Design all conditions, rank by information value

Ask:

"What's the core hypothesis we're testing? One sentence."

Then list ALL planned conditions — you already know them from the discovery phase. Show them ranked by what teaches you the most first:

HYPOTHESIS: [one sentence]
FALSIFICATION: [what result kills it]

CONDITIONS (ranked by information value):
  1. [highest priority] — tests [what], teaches [what] if it fails/passes
  2. [second] — only run this if condition 1 passes/fails in [way]
  3. [third] — ...
  N. [memory transplant / killer test] — run last, most definitive

RANKING LOGIC: [one sentence explaining why this order]

Ask:

"Does this ranking make sense? Any condition you want to move up or down?"

Wait. Adjust if needed. Then create TodoWrite tasks — one per condition, in ranked order.

Write hypothesis.md:

# Hypothesis
[one sentence]

# Prediction
[what you expect to observe if hypothesis is correct]

# Falsification
[what result proves it wrong]

# Conditions (ranked)
1. [name]: [what changes] — Priority: [why first]
2. [name]: [what changes] — Priority: [why second]
...

Phase 2 — Run one experiment at a time

For each condition, in ranked order:

BEFORE running — state the question:

"Condition [N]: [name]. The question this answers: [one sentence]. Expected result if hypothesis holds: [specific]. Expected result if hypothesis fails: [specific]. Running now."

AFTER running — ask for the insight:

"Result: [metric]. Expected? What does this tell you — one sentence."

Wait for their answer. Then add your own causal read:

"My read: [mechanistic explanation]. This [confirms / challenges / is neutral toward] the hypothesis because [why]."

Log the insight immediately — append to insights.md:

## Insight [N] — [date]
Condition: [name]
Result: [metric]
Researcher interpretation: [their words]
Causal annotation: [mechanistic explanation]
Hypothesis status: [strengthened / weakened / unchanged / killed]
Next question this raises: [what you now want to know]

Mark the TodoWrite task complete. Then ask:

"Given this result, does the ranking still make sense? Or do you want to reprioritize?"

Only then propose the next condition.

Phase 3 — After each insight, update the ledger

After every experiment, run:

# Append to insights ledger
echo "---" >> .research-loop/sessions/<slug>/insights.md

Then update knowledge_graph.md — change the condition status from pending to done: [result].

Phase 4 — Kill or continue decision

After each run:

Does the result change what we think?
├── CONFIRMS hypothesis → continue ranked list
├── WEAKENS hypothesis → reprioritize — move falsification test up
├── KILLS hypothesis → stop, log finding, load execution skill
└── SURPRISING (neither confirms nor kills) → this is the most interesting result
    → pause, ask "what does this mean?", update hypothesis if needed

Surprising results are never failures. They are the most valuable signal.

Phase 5 — When all conditions are run

Before looking at the full picture, ask:

"Write the conclusion in one sentence — what does the evidence say about the hypothesis?"

Then show the full ledger. Compare their conclusion to the original prediction. If they differ — that gap is often the real finding.

Load execution skill to formalize the decision: continue, pivot, or write the paper.

Related Skills

moralespanitz/replication

testing

VerifiedTrustedCommunity

Plan and execute a structured replication workflow for a paper, claim, or benchmark with environment selection and integrity checks.

4SKILL.mdUpdated May 5, 2026

moralespanitz/replication

moralespanitz/paper-pipeline

testing

VerifiedTrustedCommunity

End-to-end paper generation pipeline ported from AutoResearchClaw (Aiming Lab). 14 phases covering topic initiation through export/publish, with human- in-the-loop gates and quality gating at each handoff. Use this when the user wants a full paper pipeline run — topic to submission-ready manuscript. Delegates to researcher/reviewer/writer/verifier subagents for stage execution and to autonomous-iteration for experiment optimization loops.

4SKILL.mdUpdated May 5, 2026

moralespanitz/paper-pipeline

moralespanitz/literature-review

testing

VerifiedTrustedCommunity

Run a structured literature review on a topic using parallel search, evidence tables with quality scoring, and primary-source synthesis.

4SKILL.mdUpdated May 5, 2026

moralespanitz/literature-review

moralespanitz/figure-agent

development

VerifiedTrustedCommunity

Publication-quality figure generation for research papers. Decision agent selects figure type (code plot vs architecture diagram). Generates Matplotlib/Seaborn code for quantitative figures with iterative improvement loop. Style-matches conference templates (NeurIPS, ICML, ICLR). Use when the paper-pipeline reaches the figure generation phase, or when a user requests figures for an existing draft.

4SKILL.mdUpdated May 5, 2026

moralespanitz/figure-agent

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/moralespanitz/research-loop.git

# Copy into Claude Code skills folder (global)
cp -r research-loop/internal/embed/claude/skills/loop ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

moralespanitz/research-loop

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT