
Create Narrative Story Arc visualization planning spec showing visual consistency across the report (same color = same model everywhere), logical figure progression, redundant figure detection, and narrative dependency between figures. Narrative lens answering "Do the figures tell a coherent story across the report?"
Deep investigation of test gaps and architectural weaknesses following an investigation, then devise a plan for architectural immunity rather than direct fixes. Use when user says "rectify", "rectify this", or wants to address root architectural causes after an investigation.
Create a single GitHub issue and immediately triage it — dedup check, classification (recipe:implementation or recipe:remediation), mixed-concern detection, and label application. Use when user says "open an issue", "create an issue", "file an issue", or "file a bug". The user-facing counterpart to report_bug.
Promote integration to main with comprehensive changelog and PR creation. Use when user says "promote to main", "open promotion PR", "integration to main", or "create release PR to main". Runs pre-flight checks, change inventory, architecture diagrams, and creates a rich PR with release notes and traceability.
Fetch PR review comments from review-research-pr, run research-aware intent validation (ACCEPT/REJECT/DISCUSS), apply targeted fixes, escalate unrerunnable findings, and post inline replies. Exit 0 drives recipe re_push_research; exit non-zero halts the cycle.
Execute a designed experiment in a worktree and collect structured results. Supports --adjust retry mode.
Analyze open GitHub issues and produce a sequenced implementation plan — grouping issues into parallel batches, ordering those batches, and tagging each issue with its recipe route. Use when user says "triage issues", "prioritize issues", or "plan issue order".
Explore a target project and generate tailored recipes and config through an interactive workflow. Use when user wants to onboard a new project to AutoSkillit, says "setup project", or wants a starting point config.
Fetch claim findings from audit-claims, run citation-aware intent validation (ACCEPT/REJECT/DISCUSS), apply targeted citation fixes, escalate findings requiring experiment reruns, and post inline replies.
Goal-aware resolution of rebase conflicts when merging a conflict-resolution worktree back into the integration branch. Analyzes the intent of each side of a conflict, resolves it in-place when confidence is HIGH or MEDIUM, and escalates when LOW.
Research modern solutions and approaches for issues or features proposed in a report or plan. Use when user says "review approach", "review approaches", "research solutions", or wants external validation of a proposed direction.
Execute an arbitrary prose task as a headless Claude session. For smoke-test pipeline use only.
Verify an architecture diagram against the actual codebase. Checks component existence, connection accuracy, and read/write directionality. Use when user says "verify diagram", "verify diag", "check diagram", or wants to validate diagram accuracy.
Create Uncertainty Representation visualization planning spec showing error bar definitions, distribution-aware alternatives, and multi-seed variance protocols. Statistical lens answering "How is uncertainty honestly represented?"
Create Annotative Caption visualization planning spec showing declarative titles, axis labels with units, error definition in legend, baseline references, sample sizes, and venue-specific caption format. Annotative lens answering "Are figure captions and axis labels fully self-contained?"
Validate that plan phases and WPs align with the stated task
Triage STOP verdict findings from review-design, classifying each as ADDRESSABLE/STRUCTURAL/DISCUSS using parallel subagents. If any are ADDRESSABLE or DISCUSS, generate revision_guidance and emit resolution=revised. If all are STRUCTURAL, emit resolution=failed for terminal stop.
Worktree retry executor. ALWAYS invoke this skill when instructed to continue or retry an implementation in an existing worktree. Do not resume editing files directly — use this skill first to load the retry workflow.
Create Causal Assumptions experimental design diagram showing confounders, mediators, colliders, and identification strategy. Causal-structural lens answering "What causal assumptions support this design?"
Composition executor for pull requests. ALWAYS invoke this skill when instructed to compose a PR. Do not read prep files or create PRs directly — use this skill first to load the composition workflow.
Backfill structured requirements on existing GitHub issues triaged with recipe:implementation labels. Scans candidates, skips already-enriched issues, performs codebase-grounded analysis, and appends a Requirements section in REQ-{GRP}-NNN format via gh issue edit.
Elaborate a migration plan phase into a complete self-contained implementation plan. Use when user says "elaborate phase", "elaborate phase N", or "phase elaboration". Assesses codebase, writes detailed phase plan, then validates with dry walkthrough.
Reads a PR prep file and validated experiment diagrams, composes the PR body, and creates the GitHub PR. Part 3 of 3 in the decomposed research-PR flow.
Scan Claude Code project logs for friction patterns — repeated failures, approach loops, tool errors, misunderstanding cycles, and stuck workflows. Categorizes and counts friction events to surface what causes the most resistance. Use when user says "audit friction", "find friction", "friction audit", or "what keeps going wrong".
Synthesize experiment results into a structured research report in the research/ folder. Supports --inconclusive flag.
Decompose a task, plan, roadmap, or feature description into a structured set of requirements grouped for independent planning. Use when user says "make req", "make requirements", "decompose requirements", "extract requirements", or wants to break down a task into what needs to be true.
Break a large plan, architecture proposal, or feature document into sequenced implementation groups for the make-plan pipeline. Use when user says "make groups", "group requirements", "sequence groups", or wants to decompose a large document into ordered implementation units.
Worktree implementation executor. ALWAYS invoke this skill when instructed to implement a plan in a worktree with testing and merging. Do not read the plan or edit files directly — use this skill first to load the full implementation workflow.
Generate architecture diagram for a specific component or system. Prompts user to select which area to document, then creates comprehensive mermaid diagrams.
Analyze a GitHub issue for mixed concerns and split it into focused sub-issues with proper cross-references. Integrates into triage-issues as a pre-classification step.
Orchestrates 2–4 vis-lens skills in parallel to produce a figure inventory (visualization-plan.md) and a report-placement outline (report-plan.md). Runs after design review GO, before worktree creation.
Merge a single PR into the integration branch. For simple PRs, uses gh pr merge --squash --auto to enforce GitHub's required status checks. For needs_check PRs, re-assesses complexity and returns needs_plan=true with a conflict report when conflicts are detected. Use inside the merge-prs loop.
Create Benchmark Representativeness experimental design diagram showing coverage matrix, generalization gaps, and untested regions. Generalizability lens answering "Does this generalize beyond the test bed?"
Reload the current AutoSkillit session — signals the parent process to re-launch with the full wrapper environment and resume the conversation.
Investigate a bug pattern audit report and design architectural guards (tests, contracts, structural changes) that provide immunity to each identified pattern. Use when user says "design guards", "design defenses", or wants architectural solutions for bug patterns.
Create Pipeline Integrity experimental design diagram showing data splits, leakage points, preprocessing order, and label contamination. Integrity lens answering "Could data handling create optimistic bias?"
Create a validity threat matrix identifying alternative explanations and design mitigations. Adversarial lens answering "What alternative explanations survive?"
Create Sensitivity & Robustness experimental design analysis identifying load-bearing analytic choices and untested perturbations. Robustness lens answering "Which assumptions are load-bearing?"
Create Randomization & Blocking experimental design diagram showing assignment mechanisms, blocking factors, and comparability sources. Design-Structural lens answering "Where does comparability come from?"
Analyze severity of experimental tests — adversarial cases, negative controls, falsification tests, easy-pass detection, and confirmatory theater. Falsificationist lens answering "Would this design have caught the error?"
Audit codebase for adherence to architectural standards, practices, and rules. Use when user says "audit arch", "audit architecture", "check architecture", or "architectural review". Spawns parallel subagents to examine multiple architectural aspects and generates a structured report.
Audit documentation for drift, staleness, and inconsistency against the actual codebase. Use when user says "audit docs", "check documentation", "docs audit", or "documentation review". Spawns parallel subagents to explore codebase subsystems, then cross-references all documentation sources against findings.
Create Security architecture diagram showing trust boundaries, validation layers, and process isolation. Security lens answering "Where are the trust boundaries?"
Create Data Lineage architecture diagram showing information flow, transformations, and storage destinations. Data-centric lens answering "Where is the data?"
Audit the codebase against defense standards derived from historical bug patterns. Standards accumulate over time as new patterns are discovered via audit-bugs and design-guards. Use when user says "audit defenses", "audit defense standards", "check defenses", or "defense audit".
Identify clusters of related triaged GitHub issues sharing the same recipe route and collapse them into a single combined issue with full content from all originals. Closes originals with cross-reference comments. Inverse of issue-splitter.
Diagnostic executor for CI failures. ALWAYS invoke this skill when instructed to diagnose CI failures. Do not fetch CI logs directly — use this skill first to load the diagnosis workflow.
Create and edit mermaid diagrams in markdown files. Use when creating flowcharts, sequence diagrams, or any mermaid syntax in .md files.
Elaborate all work packages for a target phase via parallel L0 subagents. One invocation per phase; spawns one L0 per WP concurrently. (Pass 3 loop body)
Refine elaborated assignments for a single phase via parallel L0 subagents (L1+L0 pattern), using per-phase context file with peer_summaries for cross-phase visibility
Refine elaborated phases with cross-phase visibility via parallel L0 subagents (L1+L0 pattern)
Preparation executor for pull-request metadata. ALWAYS invoke this skill when instructed to prepare PR metadata. Do not read plans or classify files directly — use this skill first to load the preparation workflow.
Analyze issue dependencies and produce a dispatch execution map for parallel orchestration
Assess whether analytic decisions were pre-specified or post-hoc and whether exploratory/confirmatory norms are aligned. Boundary lens answering "Is this discovery or test, and are norms aligned?"
Elaborate a single phase into a full result, parallel-safe — receives plan snapshot + target phase ID
Generate high-level phases from project analysis (Pass 1 entry point)
Post-Pass-3 global dependency DAG construction — detects implicit dependencies individual WP sessions could not see
Extract domain knowledge and naming conventions for planning context
Create Operational architecture diagram showing CLI workflows, configuration, and observability. Administration lens answering "How is it run and monitored?"
Analyze all open PRs targeting a base branch — determine merge order, identify file overlaps, and tag each PR as simple or needs_check for complexity. Use at the start of a PR consolidation workflow.
Create Error/Resilience architecture diagram showing failure handling, recovery mechanisms, and circuit breakers. Diagnostic lens answering "How are failures handled?"
Create Repository/Data Access architecture diagram showing the repository pattern, entity relationships, and data access patterns. Data-centric lens answering "How is data accessed?"
Parallel subagent-driven claim extraction and citation integrity audit for research PRs. Extracts claims by section, matches against available evidence, classifies unsupported claims as findings, and emits a verdict for recipe routing.
Audit codebase for internal cohesion - how well components fit together and maintain consistent patterns. Distinct from audit-arch (which checks rule violations); this checks integration fitness and convergence. Use when user says "audit cohesion", "check cohesion", "cohesion audit", or "alignment check".
Create Process/Execution Flow architecture diagram showing runtime behavior, state transitions, and decision points. Physiological lens answering "How does it behave?"
Implementation executor. ALWAYS invoke this skill when instructed to implement a plan in a worktree. Do not read the plan or edit files directly — use this skill first to load the full implementation workflow.
Audit feature flag isolation — traces import chains, runtime gates, tool/skill tag coverage, UI surfaces, and test markers to detect leakage and miswiring.
Create State Lifecycle architecture diagram showing field contracts, validation gates, and resume safety. Contract overlay lens answering "How is state corruption prevented?"
Audit a completed implementation against its originating plan(s). Returns GO (merge approved) or NO GO (generates remediation file for retry). Final gate before merge in any implementation pipeline.
Create Estimand Clarity experimental design analysis decomposing the implicit estimand from code vs. explicit claims from prose. Evidential lens answering "What exactly is the claim?"
Plan validation executor. ALWAYS invoke this skill when instructed to validate or dry-walkthrough a plan. Do not read the plan or trace changes directly — use this skill first to load the validation workflow.
Analyze statistical error budget showing Type I/II errors, power, minimum detectable effect, multiplicity corrections, and sequential monitoring. Statistical lens answering "Are error risks sized and controlled?"
Create Comparator Construction experimental design analysis assessing whether baselines and controls are fair and relevant. Counterfactual lens answering "Is the comparator fair and relevant?"
Create a comparison fairness matrix assessing whether alternatives are evaluated under symmetric constraints. Fairness lens answering "Are alternatives compared under symmetric constraints?"
Create a risk register and stakeholder impact assessment for experiments with deployment implications. Governance lens answering "What risks arise from acting on this result?"
Create a variance analysis profile assessing whether signals exceed noise and whether results are stable across random seeds. Stability lens answering "Is the signal larger than the noise?"
Interactive selection of experimental design lens for visualizing experiment methodology. Routes to the appropriate exp-lens-* skill.
Deep investigation of errors, bugs, or codebase questions without making any code changes. Use when user mentions investigate, understand, explore, analyze, or pastes error tracebacks. Spawns parallel subagents for comprehensive exploration.
Interactively author a campaign recipe YAML through a 6-phase guided workflow. Use when user says "make campaign", "create campaign", "author campaign", "new campaign recipe", or wants to decompose a campaign goal into dispatches.
Planning executor. ALWAYS invoke this skill when instructed to create, devise, or write an implementation plan. Do not explore the codebase or draft a plan directly — use this skill first to load the planning workflow.
Apply versioned migration notes to an AutoSkillit recipe. Use when user confirms migration, called by agent or autoskillit migrate CLI, or invoked directly.
Analyze WP complexity per phase and emit consolidation group manifests for trivial WP merging (L1+L0 pattern)
Convert a scope report into a structured experiment plan with hypothesis, variables, phases, and success criteria.
Analyze project structure for planning decomposition
Assess each work package for review-approach benefit before implementation. Writes review_approach_assessment.json; does NOT invoke review-approach.
Elaborate all assignments for a target phase via parallel L0 subagents. One invocation per phase; spawns one L0 per assignment concurrently. (Pass 2 loop body)
Targeted fix of validate_plan findings — re-elaboration, duplicate resolution, dependency corrections
Reads a research report and experiment plan, synthesizes a recommendation, selects 1-2 exp-lens lenses, writes a context file per lens, and writes a PR prep file. Does NOT open a PR. Part 1 of 3 in the decomposed research-PR flow.
Execute recipe sessions batch-by-batch for triaged GitHub issues. Reads the triage-issues output manifest, processes each batch sequentially, and launches the appropriate recipe for each issue. Use when user says "process issues", "run issues", or "execute pipeline for issues".
Create Temporal Dynamics visualization planning spec showing axis scaling (linear vs log), smoothing disclosure, epoch/step alignment, run aggregation (mean + variance bands), early-stopping markers, and wall-clock vs step-count x-axis. Temporal lens answering "Are training dynamics shown clearly and honestly?"
Fetch PR review comments, run intent validation (ACCEPT/REJECT/DISCUSS) before applying fixes, and post inline replies. MCP-only — used exclusively by recipe orchestration via run_skill after review_pr reports changes_requested or needs_human verdict.
Validate an experiment plan before execution using triage-first, fail-fast dimensional analysis with an adversarial red-team. Emits verdict (GO/REVISE/STOP), experiment_type, evaluation_dashboard, and revision_guidance.
Automated diff-scoped PR code review using parallel audit subagents. Posts inline GitHub review comments and submits a summary verdict. Use after a PR is opened to gate CI on review approval.
Automated diff-scoped research PR review using parallel audit subagents aligned to research quality dimensions. Posts inline GitHub review comments and submits a summary verdict. Use after a research PR is opened to gate on review approval.
Survey codebase and web sources to build a known/unknown matrix for a research question. Phase 1 of the research recipe.
Pre-flight environment gate for the research recipe. Reads the experiment plan, detects the required environment type, builds a Docker image or creates a host micromamba environment, and emits an env_mode verdict consumed by downstream steps.
Pre-flight resource gate for the research recipe. Reads the experiment plan's data_manifest, checks disk space and network connectivity for external/gitignored entries, creates data directory structure, and emits a PASS/WARN/FAIL feasibility verdict.
--- name: troubleshoot-experiment categories: [research] hooks: PreToolUse: - matcher: "*" hooks: - type: command command: "echo '[SKILL: troubleshoot-experiment] Diagnosing experiment failure...'" once: true --- # troubleshoot-experiment Skill Read session logs and process traces for a failed research pipeline step, classify why it failed, write a structured diagnosis artifact, and emit `is_fixable` signal for orchestrator routing. Called by the `resea
Validate review-decisions audit findings with mandatory intent analysis and seven evidence-gathering rules. Adds docstring-as-contract recognition, deliberate-change detection, test-as-intent-signal, consumer-impact verification, architectural feasibility checks, behavioral simulation, and symmetry-as-design recognition to the standard validation workflow. Use when validating reports from audit-review-decisions specifically.
Create Anti-Pattern Detection visualization audit showing severity-tiered catalog of visualization anti-patterns present in or planned for the experiment. Diagnostic lens answering "Which visualization anti-patterns are present?"
Create Always-On visualization triage report running three sequential analysis passes (anti-pattern, accessibility, annotation completeness) and emitting a combined PASS|WARN_N|FAIL_N verdict. Composite lens answering "What are the blocking visualization issues?"
Create Chart Type Selection visualization planning spec showing encoding channel assignments, Cleveland-McGill perceptual hierarchy, and data-type→chart-type matrix. Typological lens answering "Which chart type is perceptually optimal for this data?"
Create Methodology Norms visualization planning spec showing ML sub-area mandatory figures, community conventions, and coverage gaps. Methodology-Normative lens answering "Which domain-specific figures are expected by reviewers?"
Failure resolution executor. ALWAYS invoke this skill when instructed to fix test failures in a worktree. Do not read test output or edit code directly — use this skill first to load the failure resolution workflow.
Open the AutoSkillit kitchen — reveals all kitchen MCP tools for this session. Human-only entry point.
Create C4 Container architecture diagram showing static structure, building blocks, and technology choices. Anatomical lens answering "How is it built?"
Close the AutoSkillit kitchen — hides kitchen MCP tools for this session.
Create Concurrency architecture diagram showing parallel execution patterns, thread pools, synchronization, and barriers. Physiological lens answering "How does parallelism work?"
Create Deployment/Physical architecture diagram showing infrastructure topology, process boundaries, and network communication. Physical lens answering "Where does it run?"
Create Development architecture diagram showing project structure, build tools, and quality gates. Development lens answering "How is it built and tested?"
Create Module Dependency architecture diagram showing package coupling, layering, and fan-in/fan-out. Structural lens answering "How are modules coupled?"
<!-- Internal bootstrap document — not a user-invocable skill. Injected by open_kitchen() into every orchestrator session. --> # Sous Chef: Global Orchestration Rules These rules apply to ALL orchestration sessions, whether following a recipe or operating ad-hoc. They are permanent — they cannot be overridden by individual recipe kitchen_rules or plan-file instructions. --- ## MULTI-PART PLAN SEQUENCING — MANDATORY When `plan_parts` contains more than one file (Part A, Part B, …): 1.
Create Scenarios architecture diagram showing end-to-end user journeys and component cooperation validation. Validation lens answering "Do the components work together?"
Analyze historical bug patterns by mining Claude Code project logs for /autoskillit:investigate skill invocations since a specified date. Identifies recurring root causes, architectural gaps, and proactive detection strategies. Use when user says "audit bugs", "bug patterns", "analyze investigations", or "bug audit".
Audit merged PR review threads for agreed-but-deferred suggestions (design decisions, future work, out-of-scope items) that were never implemented. Mines REVIEW-FLAG markers from resolve-review and legacy keyword signals. Produces a structured markdown report with VALID/RESOLVED/STALE classifications and annotates processed threads with [AUDIT] markers to prevent re-identification on future runs.
Audit the test suite for useless tests, consolidation opportunities, over-mocking, weak assertions, placement/organization issues, xdist safety violations, test path filter integrity, and other test quality issues. Use when user says "audit tests", "audit test suite", "review tests", or "test quality check". Generates an improvement plan in {{AUTOSKILLIT_TEMP}}/ with explanations for each proposed change.
--- name: bundle-local-report categories: [rendering] --- # bundle-local-report Convert a research markdown report into a self-contained `report.html` with inlined mermaid diagrams and inserted plot images from `yaml:figure-spec` blocks. ## Critical Constraints **NEVER:** - Raise a fatal error on missing diagrams or missing visualization-plan — log and continue. - Use the ESM mermaid build — ESM triggers CORS under `file://`; always use the UMD bundle (`mermaid.min.js`). - Exit without emitt
Create Unit Interference experimental design diagram showing unit hierarchy, cluster structure, shared resources, and SUTVA violation pathways. Causal-Structural lens answering "What is the unit, and can treatments spill over?"
Create Iterative Learning experimental design diagram showing factor space exploration, adaptive allocation, and next-experiment recommendations. Decision-Theoretic lens answering "How does this maximize learning per cost?"
Analyze measurement validity for experimental design — auditing metric-construct alignment, proxy validity, reliability, sensitivity, and consequential validity. Argumentative lens answering "Do measurements justify the interpretation?"
Create Reproducibility Artifacts experimental design diagram showing run instructions, environment capture, data availability, determinism controls, and audit trail. Transparency lens answering "Could an independent party reproduce this?"
Deploy experiment artifacts in an isolated git worktree following an approved experiment plan, with per-phase commits.
Create an integration PR for the merge-prs. Reads pr_order_file JSON, generates a rich PR body with per-PR details, arch-lens diagrams, and carried-forward Closes #N references. Closes all collapsed PRs with a comment after creation. Use inside the merge-prs after all PRs have been merged into the integration branch.
Create a GitHub issue and PR summarizing pipeline bugs and fixes. Use when a pipeline run completes with accumulated bug fixes on a feature branch.
Investigate a bug from provided error context, write a structured markdown report to the given path, and output a deduplication fingerprint. Lightweight — no parallel subagents.
Refine elaborated work packages with cross-phase visibility via per-phase L0 subagents (L1+L0 pattern)
Generate YAML recipes for .autoskillit/recipes/. Use when user says "make script skill", "generate script", "script a workflow", "write a script", "create a script", "new recipe", "write a pipeline", or when loaded by other skills for script formatting.
Validate test audit findings with test-domain semantic rules and intent analysis. Adds import-path-as-contract recognition, precondition-as-assertion detection, provenance verification, split-era lifecycle awareness, and deletion-vs-improvement distinction to the standard validation workflow. Use when validating reports from audit-tests specifically.
Create Decisional Layout visualization planning spec showing figure-vs-table selection heuristics: tables win for exact values, ≤5 items, leaderboards, and ablation matrices; figures win for trends, distributions, and spatial patterns; borderline cases recommend both. Decisional lens answering "Should this result be a figure or a table?"
Validate audit findings from audit-arch, audit-tests, audit-cohesion, audit-feature-gates, audit-docs, or audit-review-decisions against actual code, git history, and design intent using 9–10 parallel subagents. Removes contested findings, documents exceptions, adjusts severities. Use when user says "validate audit", "validate findings", "validate report", or "check audit results".
Create Chromatic Accessibility visualization planning spec showing colorblind safety (Okabe-Ito, Paul Tol palettes), perceptual uniformity checks (viridis/cividis pass; jet/rainbow fail), non-color redundant encoding (shape + line-style), and text size minimums. Chromatic lens answering "Is the color encoding accessible and perceptually uniform?"
Create Compositional Layout visualization planning spec showing small-multiples vs overlay decisions, faceting strategy (row/col), shared-axis alignment, grouped vs stacked bars, factorial interaction plots, and panel reading order. Compositional lens answering "Which layout best reveals the comparison structure?"
Create Replicative Reproducibility visualization planning spec showing data availability, preprocessing parameter disclosure (bin widths, smoothing windows), plotting library/version, random seeds, and code reference per figure. Replicative lens answering "Can the figures be reproduced from the data and code?"
Audit codebase for adherence to architectural standards, practices, and rules. Use when user says "audit arch", "audit architecture", "check architecture", or "architectural review". Spawns parallel subagents to examine multiple architectural aspects and generates a structured report.