Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

aviskaar/experiment-design

Name: experiment-design
Author: aviskaar

skills/experiment-design/SKILL.md

npx skillsauth add aviskaar/open-org experiment-design

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Experiment Design

Design rigorous machine learning experiments that produce credible, reproducible results.

Design Checklist

Work through each section before writing any code.

1. Hypothesis

State the hypothesis as a falsifiable claim:

"We claim that [method X] achieves [metric Y] on [dataset Z] because [mechanism]."

If the hypothesis is vague, help the user sharpen it before proceeding.

2. Independent and Dependent Variables

Independent variable: What is being changed (e.g., architecture, loss function, data augmentation)?
Dependent variable: What is being measured (e.g., accuracy, FID score, wall-clock time)?
Controlled variables: List everything held constant.

3. Baselines

Select baselines at three levels:

Naive: A trivially simple method (majority class, mean predictor)
Standard: The most widely-used existing approach
Strong: The current state-of-the-art on the chosen benchmark

Justify each choice. Avoid strawman baselines.

4. Datasets and Splits

Name the dataset, version, and source.
Specify train/val/test splits. Use standard splits if they exist.
Flag any data leakage risks.
Note dataset limitations (bias, domain coverage, size).

5. Metrics

Choose metrics that align with the task objective.
Prefer metrics with established semantics over novel ones.
Report multiple metrics when they capture different aspects.
Specify statistical significance: report means ± standard deviation over N seeds.

6. Compute Budget

State the hardware, estimated runtime, and number of seeds. This enables reproducibility and contextualizes cost.

7. Ablations

Design ablations that isolate each component's contribution. Each ablation should remove or replace exactly one thing.

8. Failure Modes

Identify at least two ways the experiment could give misleading results, and how to detect or mitigate them.

Output Format

Produce a structured experiment plan as a markdown document with all sections above filled in. Highlight any section where the user needs to make a decision before proceeding.

aviskaar/experiment-design

skills/experiment-design/SKILL.md

Use this skill when designing ML/AI experiments, evaluation protocols, or research benchmarks. Guides hypothesis specification, baseline selection, metric choice, and experimental controls to ensure results are valid and reproducible.

4 stars

testing

Updated May 12, 2026

$ install --global

skillsauth

npx skillsauth add aviskaar/open-org experiment-design

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 12, 2026, 3:45 AM149.8s1 file scanned

SKILL.md

name:: experiment-design
description:: Use this skill when designing ML/AI experiments, evaluation protocols, or research benchmarks. Guides hypothesis specification, baseline selection, metric choice, and experimental controls to ensure results are valid and reproducible.
license:: Apache-2.0
author:: aviskaar
version:: 1.0
tags:: experiments, evaluation, ML, reproducibility, methodology

Experiment Design

Design rigorous machine learning experiments that produce credible, reproducible results.

Design Checklist

Work through each section before writing any code.

1. Hypothesis

State the hypothesis as a falsifiable claim:

"We claim that [method X] achieves [metric Y] on [dataset Z] because [mechanism]."

If the hypothesis is vague, help the user sharpen it before proceeding.

2. Independent and Dependent Variables

Independent variable: What is being changed (e.g., architecture, loss function, data augmentation)?
Dependent variable: What is being measured (e.g., accuracy, FID score, wall-clock time)?
Controlled variables: List everything held constant.

3. Baselines

Select baselines at three levels:

Naive: A trivially simple method (majority class, mean predictor)
Standard: The most widely-used existing approach
Strong: The current state-of-the-art on the chosen benchmark

Justify each choice. Avoid strawman baselines.

4. Datasets and Splits

Name the dataset, version, and source.
Specify train/val/test splits. Use standard splits if they exist.
Flag any data leakage risks.
Note dataset limitations (bias, domain coverage, size).

5. Metrics

Choose metrics that align with the task objective.
Prefer metrics with established semantics over novel ones.
Report multiple metrics when they capture different aspects.
Specify statistical significance: report means ± standard deviation over N seeds.

6. Compute Budget

State the hardware, estimated runtime, and number of seeds. This enables reproducibility and contextualizes cost.

7. Ablations

Design ablations that isolate each component's contribution. Each ablation should remove or replace exactly one thing.

8. Failure Modes

Identify at least two ways the experiment could give misleading results, and how to detect or mitigate them.

Output Format

Produce a structured experiment plan as a markdown document with all sections above filled in. Highlight any section where the user needs to make a decision before proceeding.

Related Skills

aviskaar/template-skill

documentation

VerifiedTrustedCommunity

Replace with a description of the skill and when the agent should use it. Write this as a trigger condition: 'Use this skill when...'

4SKILL.mdUpdated May 12, 2026

aviskaar/template-skill

aviskaar/whitepaper-engine

development

VerifiedTrustedCommunity

Use this skill when a marketing team needs to produce a credibility-building whitepaper by collaborating with engineering, product, sales, and C-level teams. Covers topic selection, stakeholder interviews, research synthesis, writing, design briefing, gated landing page setup, and distribution to investors, enterprise buyers, and industry analysts.

4SKILL.mdUpdated May 12, 2026

aviskaar/whitepaper-engine

aviskaar/threat-hunter

development

VerifiedTrustedCommunity

Use this skill when you need proactive threat hunting campaigns, MITRE ATT&CK-based hunt hypotheses, IOC sweeps, behavioral anomaly investigation, threat intelligence integration, adversary emulation planning, SOC analyst triage support, SIEM query development (KQL/SPL/YARA), or automated threat detection engineering. Trigger for threat hunting sprints, new threat intel indicators, or post-incident proactive sweeps.

4SKILL.mdUpdated May 12, 2026

aviskaar/threat-hunter

aviskaar/tax-compliance

testing

VerifiedTrustedCommunity

Use this skill when a VP Tax, Tax Manager, Controller, or Finance Director needs to manage all tax obligations of a company — including corporate income tax, GST/VAT/Sales Tax, payroll taxes, transfer pricing, R&D tax credits, and multi-jurisdictional tax compliance. Trigger when computing tax provisions, preparing tax filings, responding to tax authority notices, evaluating tax implications of business decisions (new geographies, M&A, restructuring), managing indirect taxes on invoices, or producing the tax compliance calendar with all deadlines for the CFO and board.

4SKILL.mdUpdated May 12, 2026

aviskaar/tax-compliance

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/aviskaar/open-org.git

# Copy into Claude Code skills folder (global)
cp -r open-org/skills/experiment-design ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

aviskaar/open-org

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT