skills/skill-distiller/SKILL.md
Converts Opus-quality skills into deterministic Haiku-executable workflows via trace-driven distillation and cross-model validation. Triggers on: "distill this skill", "make this skill work on Haiku", "cross-model optimization", "optimize skill for cost". NOT for code simplification, use code-refiner.
npx skillsauth add mathews-tom/armory skill-distillerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Transform skills authored for high-capability models (Opus) into deterministic workflows that execute reliably on lower-cost models (Sonnet, Haiku). The core insight from EvoSkills: skills encode reusable task structure, not model-specific artifacts. A skill evolved on Opus transfers with +35-45pp gains to other models — but only when the instructions are sufficiently deterministic that lower-capability models can follow them without improvising.
| File | Contents | Load When |
| -------------------------------------- | -------------------------------------------------- | ------------------------------------ |
| references/distillation-patterns.md | Pattern catalog for converting reasoning to rules | Always |
package-evaluator at >= 70%surrogate-verifier skill for cross-model assertion checkingScore each section of the source SKILL.md for reasoning difficulty:
| Complexity Signal | Score | Distillation Action | | ------------------------------------- | ----- | -------------------------------------------- | | Decision tree with 3+ branches | HIGH | Convert to explicit if/then lookup table | | "Use judgment" or "consider context" | HIGH | Replace with concrete heuristic rules | | Multi-step inference chain | HIGH | Break into numbered atomic steps | | Reference to domain expertise | MED | Add explicit reference file with knowledge | | Clear enumerated steps | LOW | Keep as-is | | Concrete examples with expected output| LOW | Keep as-is |
Produce a complexity map: section name -> complexity score -> planned action.
Execute the source skill with Opus on 5 representative tasks:
evals/cases.yaml (positive cases) or generate new onesFrom the collected traces, extract deterministic patterns:
Rewrite the SKILL.md applying all distillation actions from Phase 1:
| Source Pattern | Distilled Replacement | | -------------------------------------- | ------------------------------------------------------------ | | "Analyze the code and determine..." | "Check for these 5 specific patterns: [list]" | | "Use appropriate formatting" | "Output as a markdown table with columns: [A, B, C]" | | "Consider the context to decide..." | "If [condition A]: do X. If [condition B]: do Y. Default: Z" | | "Apply best practices for..." | Reference file with explicit best practices enumerated | | Multi-paragraph reasoning instruction | Numbered step list with single-sentence steps |
Rules for the rewrite:
Run the distilled skill on the target model (Haiku or Sonnet):
surrogate-verifier to generate assertions for each task output| Metric | Source (Opus + original) | Target (Haiku + distilled) | Delta | | ------------------------------- | ------------------------ | -------------------------- | ----- | | Assertions passed | N/M | N/M | ± | | Weighted score | X.XX | X.XX | ± | | Output completeness | % | % | ± | | Format compliance | % | % | ± |
Produce the final comparison:
# Skill Distillation Report: <skill-name>
## Complexity Reduction
- Sections distilled: N/M (HIGH → LOW)
- Instruction word count: original X → distilled Y (Z% reduction)
- Decision points replaced with lookup tables: N
## Cross-Model Performance
| Model | Assertions Passed | Weighted Score | Format Compliance |
|---------|-------------------|----------------|-------------------|
| Opus | 7/7 | 1.00 | 100% |
| Sonnet | 6/7 | 0.92 | 100% |
| Haiku | 5/7 | 0.85 | 85% |
## Changes Made
1. [Section] "Analyze complexity" → explicit 5-item checklist
2. [Section] "Apply formatting" → fixed markdown table template
...
## Recommendation
[SHIP | ITERATE | MANUAL_REVIEW_NEEDED]
| Error | Resolution | | ----------------------------------- | ------------------------------------------------------------- | | Source skill scores below 70% | Refuse distillation; recommend evolution via test-engineer | | No execution traces available | Generate synthetic tasks and collect traces before proceeding | | Target model fails all assertions | Skill may be too complex for target model; report with detail | | Distilled skill longer than source | Review distillation; patterns may need consolidation |
testing
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.
development
Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.