skills/skillxiv-v0.0.2-claude-opus-4.6/diffusionvl-ar-to-diffusion/SKILL.md
Convert pre-trained autoregressive vision-language models into diffusion VLMs without architectural modifications. Use block diffusion strategy enabling arbitrary-length generation and KV-cache reuse. Hybrid attention enforces bidirectional within blocks, causal between blocks. Requires less than 5% of data compared to prior diffusion VLM methods.
npx skillsauth add ADu2021/skillXiv diffusionvl-ar-to-diffusionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
DiffusionVL demonstrates direct conversion of already vision-language aligned autoregressive models into diffusion VLMs through full-parameter diffusion finetuning. The approach converts the "next-token prediction paradigm into a diffusion paradigm." Block diffusion strategy enables arbitrary-length generation with intra-block parallel denoising and inter-block autoregressive decoding. Hybrid attention mechanism enforces bidirectional attention within blocks and causal attention between blocks. Remarkably, requires less than 5% of data compared to prior diffusion VLM methods, proving "the gap between dVLMs and AR-VLMs is minimal."
Two main pathways enable vision-language paradigm conversion:
1. AR-VLM to dVLM (Paradigm Shift) Direct conversion of already vision-language aligned autoregressive models through full-parameter diffusion finetuning. Convert the "next-token prediction paradigm into a diffusion paradigm" using minimal data.
2. AR-LM to dVLM (Modality + Paradigm Shift) Two-stage approach: connector first aligns vision and text spaces using autoregressive training, then diffusion finetuning completes the conversion. Enables VLM creation from vision and language separately.
3. Block Diffusion Strategy Enable arbitrary-length generation and KV-cache reuse through:
4. Hybrid Attention Mechanism Enforce bidirectional attention within blocks (full context for parallel denoising) and causal attention between blocks (respecting generation order). This balances efficiency with coherence.
Start with pre-trained AR-VLM. Implement block diffusion strategy with hybrid attention pattern. Fine-tune entire model with diffusion objective using your target data (only 5% of original AR training needed). Monitor accuracy preservation and measure parallel generation speedup. Validate on your target VLM tasks.
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.