Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jamie-bitflight/experiment-protocol

Name: experiment-protocol
Author: jamie-bitflight

plugins/scientific-method/skills/experiment-protocol/SKILL.md

npx skillsauth add jamie-bitflight/claude_skills experiment-protocol

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Experiment Protocol

Drives the experiment-registry MCP server through a controlled experiment lifecycle. The MCP owns the state machine, validates artefacts, and enforces methodology. This skill is the caller — not the logic.

Core Problem

Uncontrolled testing contaminates results. The most common failure mode is embedding success criteria inside the input the subject under test receives — this measures instruction-following ability, not the quality of the instructions themselves. A second failure mode is changing multiple variables between runs, which makes it impossible to attribute any result to any cause. The third is writing scoring criteria after seeing output, which lets expected results shape the rubric rather than the other way around.

The experiment-registry MCP server enforces the correct protocol mechanically. Claude's role is to produce artefacts and submit them — not to manage the workflow.

Phase 1 — Setup (collaborative)

Work with the user to identify the experiment type before starting the execution loop.

flowchart TD
    Infer[Infer domain from current task context] --> List["Call list_experiment_types()"]
    List --> BestMatch[Identify best-matching type from descriptions]
    BestMatch --> Inspect["Call inspect_experiment_type(name)"]
    Inspect --> Propose[Propose type and first-step requirements to user]
    Propose --> Q{User accepts?}
    Q -->|Yes| Start["Call start_experiment(base, context, extensions)"]
    Q -->|Adjust| Adjust[User specifies different base or inline extensions]
    Adjust --> Start
    Start --> Ready[Receive experiment ID and first step — enter Phase 2]

The extensions parameter is optional. Pass it when the user specifies additions to the base type (e.g., extra checklist items or artefacts not in the registry definition).

Phase 2 — Execution (mechanical, MCP-driven)

No discussion during execution. Step through the MCP workflow autonomously.

flowchart TD
    GetStep["Call get_current_step(experiment_id)"] --> TermCheck{status is complete<br>or inconclusive?}
    TermCheck -->|Yes| Handoff[Experiment already done — see Retrospective Handoff]
    TermCheck -->|No| StepDetail[MCP returns step + checklist + required artefacts]
    StepDetail --> Human{REQUIRES_HUMAN_INPUT flagged?}
    Human -->|Yes| Surface[Surface the question to the user and wait for answer]
    Surface --> Resubmit[Include answer in artefacts and resubmit]
    Human -->|No| Produce[Produce the required artefacts]
    Produce --> Complete["Call complete_step(experiment_id, step_id, artefacts)"]
    Resubmit --> Complete
    Complete --> MCPResult{MCP response?}
    MCPResult -->|Missing artefacts| Fix[Produce the missing artefacts and resubmit]
    MCPResult -->|Validation errors| FixV[Fix validation issues and resubmit]
    Fix --> Complete
    FixV --> Complete
    MCPResult -->|Next step| GetStep
    MCPResult -->|complete| HandoffC[Experiment complete — see Retrospective Handoff]
    MCPResult -->|inconclusive| Report[Report iteration limit reached — summarise what changed]

The MCP advances state, validates artefact presence, and determines when the experiment is done. Do not attempt to track or infer step state from memory.

Read-Only Status

When the user calls /experiment-protocol status {id}, call get_current_step(experiment_id) and display the result without calling complete_step(). This does not interrupt or advance the execution loop.

Anti-Patterns

The MCP enforces these mechanically, but understanding why they are prohibited helps produce correct artefacts.

Embedding criteria in the input artefact — writing expected outcomes or scoring hints inside the fixture or input the subject receives. This tests instruction-following, not instruction quality. The rubric and fixture are separate artefacts for this reason.

Changing multiple things between iterations — if two things change simultaneously, the result cannot be attributed to either. The MCP enforces one-change-per-iteration via the iterate step.

Writing rubric criteria after seeing output — post-hoc criteria are shaped by what the subject produced. The MCP requires rubric artefacts before the baseline step runs.

Reporting only passing runs — every iteration is recorded. The MCP log captures all runs, including regressions.

Changing the control input between iterations — the task prompt, fixture, and baseline conditions are frozen after the baseline run. Changing them starts a new experiment.

Scoring by impression — every criterion is binary. Call get_current_step() to retrieve the rubric and score each criterion explicitly for each run.

Retrospective Handoff

When the MCP returns complete or inconclusive status:

Call get_experiment_summary(experiment_id) — returns artefact file paths and final status.
Pass the file paths to @retrospective-analyst for post-experiment analysis.

The analyst reads artefacts directly from disk. No reformatting or summarisation required.

jamie-bitflight/experiment-protocol

plugins/scientific-method/skills/experiment-protocol/SKILL.md

Design and run controlled experiments using the experiment-registry MCP server — domain-agnostic, pluggable, mechanically enforced. Use when you need evidence that a change actually improves behaviour.

39 stars

tools

Updated Apr 30, 2026

$ install --global

skillsauth

npx skillsauth add jamie-bitflight/claude_skills experiment-protocol

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 30, 2026, 7:41 AM36.5s1 file scanned

SKILL.md

name:: experiment-protocol
description:: Design and run controlled experiments using the experiment-registry MCP server — domain-agnostic, pluggable, mechanically enforced. Use when you need evidence that a change actually improves behaviour.
user-invocable:: true

Experiment Protocol

Core Problem

The experiment-registry MCP server enforces the correct protocol mechanically. Claude's role is to produce artefacts and submit them — not to manage the workflow.

Phase 1 — Setup (collaborative)

Work with the user to identify the experiment type before starting the execution loop.

flowchart TD
    Infer[Infer domain from current task context] --> List["Call list_experiment_types()"]
    List --> BestMatch[Identify best-matching type from descriptions]
    BestMatch --> Inspect["Call inspect_experiment_type(name)"]
    Inspect --> Propose[Propose type and first-step requirements to user]
    Propose --> Q{User accepts?}
    Q -->|Yes| Start["Call start_experiment(base, context, extensions)"]
    Q -->|Adjust| Adjust[User specifies different base or inline extensions]
    Adjust --> Start
    Start --> Ready[Receive experiment ID and first step — enter Phase 2]

The extensions parameter is optional. Pass it when the user specifies additions to the base type (e.g., extra checklist items or artefacts not in the registry definition).

Phase 2 — Execution (mechanical, MCP-driven)

No discussion during execution. Step through the MCP workflow autonomously.

flowchart TD
    GetStep["Call get_current_step(experiment_id)"] --> TermCheck{status is complete<br>or inconclusive?}
    TermCheck -->|Yes| Handoff[Experiment already done — see Retrospective Handoff]
    TermCheck -->|No| StepDetail[MCP returns step + checklist + required artefacts]
    StepDetail --> Human{REQUIRES_HUMAN_INPUT flagged?}
    Human -->|Yes| Surface[Surface the question to the user and wait for answer]
    Surface --> Resubmit[Include answer in artefacts and resubmit]
    Human -->|No| Produce[Produce the required artefacts]
    Produce --> Complete["Call complete_step(experiment_id, step_id, artefacts)"]
    Resubmit --> Complete
    Complete --> MCPResult{MCP response?}
    MCPResult -->|Missing artefacts| Fix[Produce the missing artefacts and resubmit]
    MCPResult -->|Validation errors| FixV[Fix validation issues and resubmit]
    Fix --> Complete
    FixV --> Complete
    MCPResult -->|Next step| GetStep
    MCPResult -->|complete| HandoffC[Experiment complete — see Retrospective Handoff]
    MCPResult -->|inconclusive| Report[Report iteration limit reached — summarise what changed]

The MCP advances state, validates artefact presence, and determines when the experiment is done. Do not attempt to track or infer step state from memory.

Read-Only Status

Anti-Patterns

The MCP enforces these mechanically, but understanding why they are prohibited helps produce correct artefacts.

Changing multiple things between iterations — if two things change simultaneously, the result cannot be attributed to either. The MCP enforces one-change-per-iteration via the iterate step.

Writing rubric criteria after seeing output — post-hoc criteria are shaped by what the subject produced. The MCP requires rubric artefacts before the baseline step runs.

Reporting only passing runs — every iteration is recorded. The MCP log captures all runs, including regressions.

Changing the control input between iterations — the task prompt, fixture, and baseline conditions are frozen after the baseline run. Changing them starts a new experiment.

Scoring by impression — every criterion is binary. Call get_current_step() to retrieve the rubric and score each criterion explicitly for each run.

Retrospective Handoff

When the MCP returns complete or inconclusive status:

Call get_experiment_summary(experiment_id) — returns artefact file paths and final status.
Pass the file paths to @retrospective-analyst for post-experiment analysis.

The analyst reads artefacts directly from disk. No reformatting or summarisation required.

Related Skills

jamie-bitflight/xdg-base-directory

development

VerifiedTrustedCommunity

When an application needs to store config, data, cache, or state files. When designing where user-specific files should live. When code writes to ~/.appname or hardcoded home paths. When implementing cross-platform file storage with platformdirs.

39SKILL.mdUpdated Apr 30, 2026

jamie-bitflight/xdg-base-directory

jamie-bitflight/verification-gate

testing

VerifiedTrustedCommunity

Enforce mandatory pre-action verification checkpoints to prevent pattern-matching from overriding explicit reasoning. Use this skill when about to execute implementation actions (Bash, Write, Edit) to verify hypothesis-action alignment. Blocks execution when hypothesis unverified or action targets different system than hypothesis identified. Critical for preventing cognitive dissonance where correct diagnosis leads to wrong implementation.

39SKILL.mdUpdated Apr 30, 2026

jamie-bitflight/verification-gate

jamie-bitflight/twelve-factor-app

tools

VerifiedTrustedCommunity

Reference guide for the Twelve-Factor App methodology — 15 principles (12 original + 3 modern extensions) for building portable, resilient, cloud-native applications. Use when evaluating application architecture, designing cloud-native services, reviewing codebases for methodology compliance, advising on configuration, scaling, observability, security, and deployment patterns. Incorporates the 2025 open-source community evolution and cloud-native reinterpretations of each factor.

39SKILL.mdUpdated Apr 30, 2026

jamie-bitflight/twelve-factor-app

jamie-bitflight/user-docs-to-ai-skill

tools

VerifiedTrustedCommunity

Converts user-facing documentation (how-to guides, tutorials, API references, examples) in any format — Markdown, PDF, DOCX, PPTX, XLSX, AsciiDoc, RST, HTML, Jupyter notebooks, man pages, TOML/YAML/JSON configs, and plain text — into Claude Code skill directories with SKILL.md plus thematically grouped references/*.md files. Use when given a docs directory or mixed-format documentation to transform into an AI skill. Uses MCP file-reader server for binary formats.

39SKILL.mdUpdated Apr 30, 2026

jamie-bitflight/user-docs-to-ai-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jamie-bitflight/claude_skills.git

# Copy into Claude Code skills folder (global)
cp -r claude_skills/plugins/scientific-method/skills/experiment-protocol ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jamie-bitflight/claude_skills

39 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT