skills/skillxiv-v0.0.2-claude-opus-4.6/deer-diffusion-speculative-decoding/SKILL.md
Enable efficient speculative decoding by training discrete diffusion language models for parallel draft generation. Use AR-style distillation and scribe refinement to train dLLMs. Eliminate left-to-right error accumulation through independent parallel proposals. Achieve 5.54× speedup on HumanEval vs. 2.41× for AR-based methods.
npx skillsauth add ADu2021/skillXiv deer-diffusion-speculative-decodingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
DEER (Draft with diffusion, vErify with autoRegressive) introduces speculative decoding using discrete diffusion language models for efficient parallel draft generation. The approach trains dLLMs through two-stage alignment: AR-style distillation enabling prefix-conditioned continuation and scribe refinement sharpening predictions near verification boundaries. Unlike AR drafters suffering left-to-right uncertainty accumulation, DEER's parallel generation makes "the proposal at position i independent of previously drafted tokens," enabling acceptance lengths up to 32 tokens vs. ~10 for competing methods. Results show 5.54× speedup on HumanEval.
Two-stage alignment pipeline trains diffusion drafters:
1. AR-Style Distillation (Stage I) Train discrete diffusion language model (dLLM) for prefix-conditioned continuation. Learn from truncated teacher answers marked with SEP token, enabling model to generate coherent suffixes given fixed prefix. This bridges AR teacher and diffusion student paradigms.
2. Scribe Refinement (Stage II) Enhance accuracy near verification boundary through "weighted suffix masking with exponentially decaying loss," focusing training on tokens most critical for acceptance. Improve predictions most likely to affect speculative decoding outcome.
3. Parallel Generation Advantage Unlike AR drafters with left-to-right uncertainty accumulation, dLLM generates entire token blocks in single denoising step. "The proposal at position i is independent of previously drafted tokens," preventing error propagation and enabling acceptance lengths up to 32 tokens versus ~10 for competing methods.
Start with pretrained teacher AR model. Train discrete diffusion LM for parallel generation conditioned on prefixes. Implement AR-style distillation with SEP token marking. Add scribe refinement focusing on boundary tokens. Integrate into verifier framework for speculative decoding. Measure acceptance rates and overall speedup.
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.