skills/knowledge-distillation-survey/SKILL.md
Comprehensive survey of knowledge distillation methods, architectures, and applications in neural network compression
npx skillsauth add curiositech/windags-skills knowledge-distillation-surveyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Knowledge distillation transfers learning from teacher systems to students across capability gaps. The key insight: effective transfer requires matching knowledge type, teacher capacity, and transfer mechanism to the student's representational constraints.
1. ASSESS CAPACITY GAP
├── Small gap (teacher 1-2x student params) → Direct offline distillation
├── Large gap (teacher 10x+ student params) → Progressive with teacher assistants
└── Similar capacity/peer learning → Online collaborative distillation
2. IDENTIFY KNOWLEDGE TYPE NEEDED
├── Task needs final decisions → Response-based (soft labels, logits)
├── Task needs pattern recognition → Feature-based (intermediate representations)
├── Task needs structural reasoning → Relation-based (similarity matrices, attention maps)
└── Task needs multiple types → Multi-level distillation
3. SELECT TRANSFER MECHANISM
├── Proven expert + stable domain → Offline (sequential training)
├── Exploration needed + multiple learners → Online (mutual learning)
├── No external teacher available → Self-distillation (temporal/spatial)
└── Cross-modal/domain required → Alignment-based transfer
4. HANDLE REPRESENTATION GAPS
├── Same modality → Direct feature matching
├── Different modalities → Paired sample alignment
├── Different architectures → Attention transfer or feature adaptation
└── Different tasks → Structural knowledge extraction
| Teacher-Student Performance Ratio | Approach | Rationale | |-----------------------------------|----------|-----------| | 1.1-1.5x | Direct distillation | Student can decode teacher knowledge | | 1.5-3x | Add 1 teacher assistant | Bridge representational gap | | 3x+ | Multi-step progressive | Prevent knowledge loss in translation | | Similar performance | Collaborative learning | Mutual improvement through diversity |
Scenario: Compress GPT-3.5 (175B params) to run on mobile (500M params max)
Decision Process:
Execution:
Expert catches: Temperature needs adjustment per stage; mobile model needs different attention patterns optimized for inference speed
Scenario: Robot trained on RGB cameras must work with depth sensors only
Decision Process:
Execution:
Expert catches: Depth lacks color/texture info; student needs different attention for material properties
Task completion checklist for knowledge distillation:
Do NOT use knowledge distillation for:
transfer-learning-frameworks for task adaptationmodel-optimization-techniques for architecture efficiencycurriculum-learning-design for progressive skill buildingrobust-ai-validation for safety assurance firstexplainable-ai-methods - distillation often reduces interpretabilityDelegate to other skills:
neural-architecture-searchtransfer-learning-frameworksfew-shot-learning-strategiesrobust-ai-validationmodel-optimization-techniquestools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.