Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

moralespanitz/execution

Name: execution
Author: moralespanitz

skills/execution/SKILL.md

npx skillsauth add moralespanitz/research-loop execution

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

<SUBAGENT-STOP> If you were dispatched as a subagent to analyze specific results, skip this skill. Analyze and return structured findings immediately. </SUBAGENT-STOP>

Execution Skill — Annotate + Decide

Step 0 — Load context and create todos

Read the active session:

ls -t .research-loop/sessions/
cat .research-loop/sessions/<latest>/lab_notebook.md

Summarize the state:

"You're in session [slug]. [N] experiments run. Best so far: [metric]. Last decision: [continue/pivot/kill]."

Create todos with TodoWrite:

Task 1: Annotate run #N — record result, mechanistic explanation, decision
Task 2: Update knowledge graph
Task 3: Write conclusion paragraph (before checking if result matched prediction)
Task 4: Decide — continue / pivot / kill

After each experiment run

Read the latest result:

tail -1 .research-loop/sessions/<slug>/autoresearch.jsonl | python3 -m json.tool

Ask the researcher three questions — one at a time:

Q1:

"What happened? Walk me through the result — metric value, direction, was it what you expected?"

Q2:

"Why do you think it happened? I want a mechanistic explanation, not 'the model improved'. What did the change actually do?"

Q3:

"What does this tell you about the next step?"

Append the full exchange to lab_notebook.md:

## Run #N — <node name>
Date: <date>
Mutation: <what changed>
Result: <metric value> (Δ <delta> from baseline)
Researcher explanation: <their answer to Q2>
Causal annotation: <your synthesis of why>
Decision: <continue / pivot / kill>
Next mutation rationale: <why>

Also append a node to knowledge_graph.md:

## [node name] → [result] → [next]
- Mutation: <what changed>
- Result: <metric> Δ<delta>
- Why it worked/failed: <mechanistic>
- Implication: <what to try next>

Kill/pivot/continue decision

Apply this tree — ask the researcher first, then give your recommendation:

Improved in last 5 runs?
├── YES → continue this direction
└── NO
    ├── > 10 runs total with no improvement?
    │   └── YES → KILL. Update status, move to next hypothesis.
    └── NO → PIVOT. Suggest a different mutation direction.

Show your recommendation explicitly:

"My recommendation: [continue/pivot/kill]. Here's why: [one sentence]."

Update lab_notebook.md status:

## Status
<date>: Run #N complete. Decision: <continue/pivot/kill>. Reason: <why>

When to declare success

Declare success when ALL of these are true:

Best metric is meaningfully better than baseline (not noise — run it twice)
You can explain WHY in one sentence
You have at least 2 negative results that tell you what doesn't work

Then say:

"You have enough to write the paper. Run /write or load the writing-papers skill."

moralespanitz/execution

skills/execution/SKILL.md

Use when experiments are running or just completed, or user shares results and wants to decide what to do next.

4 stars

testing

Updated Apr 20, 2026

$ install --global

skillsauth

npx skillsauth add moralespanitz/research-loop execution

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 3:02 PM20.7s1 file scanned

SKILL.md

name:: execution
description:: Use when experiments are running or just completed, or user shares results and wants to decide what to do next.

<SUBAGENT-STOP> If you were dispatched as a subagent to analyze specific results, skip this skill. Analyze and return structured findings immediately. </SUBAGENT-STOP>

Execution Skill — Annotate + Decide

Step 0 — Load context and create todos

Read the active session:

ls -t .research-loop/sessions/
cat .research-loop/sessions/<latest>/lab_notebook.md

Summarize the state:

"You're in session [slug]. [N] experiments run. Best so far: [metric]. Last decision: [continue/pivot/kill]."

Create todos with TodoWrite:

Task 1: Annotate run #N — record result, mechanistic explanation, decision
Task 2: Update knowledge graph
Task 3: Write conclusion paragraph (before checking if result matched prediction)
Task 4: Decide — continue / pivot / kill

After each experiment run

Read the latest result:

tail -1 .research-loop/sessions/<slug>/autoresearch.jsonl | python3 -m json.tool

Ask the researcher three questions — one at a time:

Q1:

"What happened? Walk me through the result — metric value, direction, was it what you expected?"

Q2:

"Why do you think it happened? I want a mechanistic explanation, not 'the model improved'. What did the change actually do?"

Q3:

"What does this tell you about the next step?"

Append the full exchange to lab_notebook.md:

## Run #N — <node name>
Date: <date>
Mutation: <what changed>
Result: <metric value> (Δ <delta> from baseline)
Researcher explanation: <their answer to Q2>
Causal annotation: <your synthesis of why>
Decision: <continue / pivot / kill>
Next mutation rationale: <why>

Also append a node to knowledge_graph.md:

## [node name] → [result] → [next]
- Mutation: <what changed>
- Result: <metric> Δ<delta>
- Why it worked/failed: <mechanistic>
- Implication: <what to try next>

Kill/pivot/continue decision

Apply this tree — ask the researcher first, then give your recommendation:

Improved in last 5 runs?
├── YES → continue this direction
└── NO
    ├── > 10 runs total with no improvement?
    │   └── YES → KILL. Update status, move to next hypothesis.
    └── NO → PIVOT. Suggest a different mutation direction.

Show your recommendation explicitly:

"My recommendation: [continue/pivot/kill]. Here's why: [one sentence]."

Update lab_notebook.md status:

## Status
<date>: Run #N complete. Decision: <continue/pivot/kill>. Reason: <why>

When to declare success

Declare success when ALL of these are true:

Best metric is meaningfully better than baseline (not noise — run it twice)
You can explain WHY in one sentence
You have at least 2 negative results that tell you what doesn't work

Then say:

"You have enough to write the paper. Run /write or load the writing-papers skill."

Related Skills

moralespanitz/replication

testing

VerifiedTrustedCommunity

Plan and execute a structured replication workflow for a paper, claim, or benchmark with environment selection and integrity checks.

4SKILL.mdUpdated May 5, 2026

moralespanitz/replication

moralespanitz/paper-pipeline

testing

VerifiedTrustedCommunity

End-to-end paper generation pipeline ported from AutoResearchClaw (Aiming Lab). 14 phases covering topic initiation through export/publish, with human- in-the-loop gates and quality gating at each handoff. Use this when the user wants a full paper pipeline run — topic to submission-ready manuscript. Delegates to researcher/reviewer/writer/verifier subagents for stage execution and to autonomous-iteration for experiment optimization loops.

4SKILL.mdUpdated May 5, 2026

moralespanitz/paper-pipeline

moralespanitz/literature-review

testing

VerifiedTrustedCommunity

Run a structured literature review on a topic using parallel search, evidence tables with quality scoring, and primary-source synthesis.

4SKILL.mdUpdated May 5, 2026

moralespanitz/literature-review

moralespanitz/figure-agent

development

VerifiedTrustedCommunity

Publication-quality figure generation for research papers. Decision agent selects figure type (code plot vs architecture diagram). Generates Matplotlib/Seaborn code for quantitative figures with iterative improvement loop. Style-matches conference templates (NeurIPS, ICML, ICLR). Use when the paper-pipeline reaches the figure generation phase, or when a user requests figures for an existing draft.

4SKILL.mdUpdated May 5, 2026

moralespanitz/figure-agent

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/moralespanitz/research-loop.git

# Copy into Claude Code skills folder (global)
cp -r research-loop/skills/execution ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

moralespanitz/research-loop

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT