src/autoskillit/skills_extended/exp-lens-sensitivity-robustness/SKILL.md
Create Sensitivity & Robustness experimental design analysis identifying load-bearing analytic choices and untested perturbations. Robustness lens answering "Which assumptions are load-bearing?"
npx skillsauth add talont-org/autoskillit exp-lens-sensitivity-robustnessInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Philosophical Mode: Robustness Primary Question: "Which assumptions are load-bearing?" Focus: Ablation Structure, Preprocessing Sensitivity, Metric Sensitivity, Hyperparameter Sensitivity, Distribution Shift
/autoskillit:exp-lens-sensitivity-robustness [context_path] [experiment_plan_path]
/autoskillit:exp-lens-sensitivity-robustness or /autoskillit:make-experiment-diag sensitivityNEVER:
{{AUTOSKILLIT_TEMP}}/exp-lens-sensitivity-robustness/run_in_background: true is prohibited)ALWAYS:
Build a full sensitivity matrix (choices x perturbation types)
Classify every analytic choice as load-bearing, minor, or untested
Flag cases where the most impactful choices are the least tested
Distinguish between ablations that were run and choices that were simply fixed
BEFORE creating any diagram, LOAD the /autoskillit:mermaid skill using the Skill tool - this is MANDATORY
If the Skill tool cannot be used (disable-model-invocation) or refuses this invocation, do NOT proceed with diagram creation. Abort this step and omit the diagram from output.
Write output to {{AUTOSKILLIT_TEMP}}/exp-lens-sensitivity-robustness/exp_diag_sensitivity_robustness_{YYYY-MM-DD_HHMMSS}.md
After writing the file, emit the structured output token as literal plain text with no markdown formatting on the token name (the adjudicator performs a regex match):
diagram_path = /absolute/path/to/{{AUTOSKILLIT_TEMP}}/exp-lens-sensitivity-robustness/exp_diag_sensitivity_robustness_{...}.md
If positional arg 1 (context_path) is provided and the file exists, read it to obtain IV/DV tables, H0/H1 hypotheses, controlled variables, and success criteria. If positional arg 2 (experiment_plan_path) is provided and exists, read the experiment plan for full methodology. Use this structured context as the foundation for Steps 1-5; skip the CWD exploration for these fields if the context file supplies them.
Spawn Explore subagents to investigate:
Analytic Choices Made
Ablation Coverage
Preprocessing Variations
Hyperparameter Sensitivity
Distribution/Environment Variations
Rows = analytic choices. Columns = perturbation types (remove, change, stress).
For each cell: Does the conclusion survive the perturbation?
CRITICAL — Analyze Assumption Load: For every analytic choice:
Classify each choice as:
If a diagram adds value, create a simplified flowchart. This is OPTIONAL for this hybrid lens — the tables are the primary output.
Direction: TB (choices flow down through perturbation to conclusion stability)
Subgraphs: "ANALYTIC CHOICES", "PERTURBATIONS TESTED", "CONCLUSION STABILITY"
Node Styling:
stateNode class: analytic choice nodeshandler class: perturbation type nodesoutput class: stable conclusion nodesgap class: load-bearing untested choice nodesdetector class: sensitivity threshold nodesWrite the analysis to: {{AUTOSKILLIT_TEMP}}/exp-lens-sensitivity-robustness/exp_diag_sensitivity_robustness_{YYYY-MM-DD_HHMMSS}.md (relative to the current working directory)
# Sensitivity & Robustness Analysis: {Experiment Name}
**Lens:** Sensitivity & Robustness (Robustness)
**Question:** Which assumptions are load-bearing?
**Date:** {YYYY-MM-DD}
**Scope:** {What was analyzed}
## Sensitivity Matrix
| Analytic Choice | Remove | Change Value | Stress Test | Overall Classification |
|----------------|--------|-------------|-------------|------------------------|
| {choice name} | Stable / Fragile / Untested | Stable / Fragile / Untested | Stable / Fragile / Untested | Load-bearing / Minor / Untested |
## Load-Bearing Assumptions
| Assumption | Evidence Type | Impact if Changed | Tested? |
|------------|--------------|-------------------|---------|
| {assumption} | Ablation / Sweep / Literature / None | High / Medium / Low | Yes / No |
## Ablation Coverage Assessment
| Factor | Ablated? | Result | Interpretation |
|--------|----------|--------|----------------|
| {factor name} | Yes / No | {delta metric if yes} | Conclusion holds / Fragile / Unknown |
## Robustness Profile
| Dimension | Status | Notes |
|-----------|--------|-------|
| Preprocessing choices | Robust / Fragile / Untested | {detail} |
| Hyperparameter choices | Robust / Fragile / Untested | {detail} |
| Metric choices | Robust / Fragile / Untested | {detail} |
| Distribution shift | Robust / Fragile / Untested | {detail} |
## Perturbation Diagram (Optional)
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
%% CLASS DEFINITIONS %%
classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
subgraph Choices ["ANALYTIC CHOICES"]
CHOICE1["Preprocessing Choice<br/>━━━━━━━━━━<br/>{e.g., normalization}"]
CHOICE2["Model Choice<br/>━━━━━━━━━━<br/>{e.g., architecture}"]
LOAD_BEARING["Load-Bearing Choice<br/>━━━━━━━━━━<br/>{untested critical choice}"]
end
subgraph Perturbations ["PERTURBATIONS TESTED"]
PERTURB1["Remove<br/>━━━━━━━━━━<br/>Ablation"]
PERTURB2["Change Value<br/>━━━━━━━━━━<br/>Sensitivity sweep"]
THRESHOLD["Sensitivity Threshold<br/>━━━━━━━━━━<br/>{delta that changes conclusion}"]
end
subgraph Stability ["CONCLUSION STABILITY"]
STABLE["Stable Conclusion<br/>━━━━━━━━━━<br/>{holds under perturbation}"]
end
CHOICE1 --> PERTURB1
CHOICE2 --> PERTURB2
PERTURB1 --> THRESHOLD
PERTURB2 --> THRESHOLD
THRESHOLD --> STABLE
LOAD_BEARING -.->|"untested risk"| THRESHOLD
class CHOICE1,CHOICE2 stateNode;
class LOAD_BEARING gap;
class PERTURB1,PERTURB2 handler;
class THRESHOLD detector;
class STABLE output;
Color Legend: | Color | Category | Description | |-------|----------|-------------| | Teal | Analytic Choices | Decision points in the pipeline | | Yellow | Load-Bearing Untested | Critical choices with no perturbation evidence | | Orange | Perturbations | Types of tests applied | | Red | Sensitivity Thresholds | Points where conclusion may change | | Dark Teal | Stable Conclusions | Results robust to perturbation |
---
## Pre-Diagram Checklist
Before creating the diagram, verify:
- [ ] LOADED `/autoskillit:mermaid` skill using the Skill tool
- [ ] Using ONLY classDef styles from the mermaid skill (no invented colors)
- [ ] Diagram will include a color legend table
---
## Related Skills
- `/autoskillit:make-experiment-diag` - Parent skill for lens selection
- `/autoskillit:mermaid` - MUST BE LOADED before creating diagram
- `/autoskillit:exp-lens-estimand-clarity` - For clarifying which conclusions are being stress-tested
- `/autoskillit:exp-lens-iterative-learning` - For tracking robustness improvements across experiment iterations
development
Generate YAML recipes for .autoskillit/recipes/. Use when user says "make script skill", "generate script", "script a workflow", "write a script", "create a script", "new recipe", "write a pipeline", or when loaded by other skills for script formatting.
data-ai
Create Uncertainty Representation visualization planning spec showing error bar definitions, distribution-aware alternatives, and multi-seed variance protocols. Statistical lens answering "How is uncertainty honestly represented?"
data-ai
Create Temporal Dynamics visualization planning spec showing axis scaling (linear vs log), smoothing disclosure, epoch/step alignment, run aggregation (mean + variance bands), early-stopping markers, and wall-clock vs step-count x-axis. Temporal lens answering "Are training dynamics shown clearly and honestly?"
data-ai
Create Narrative Story Arc visualization planning spec showing visual consistency across the report (same color = same model everywhere), logical figure progression, redundant figure detection, and narrative dependency between figures. Narrative lens answering "Do the figures tell a coherent story across the report?"