Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

ADu2021/discreteness-diffusion-llm

Name: discreteness-diffusion-llm
Author: ADu2021

skills/skillxiv-v0.0.2-claude-opus-4.6/discreteness-diffusion-llm/SKILL.md

npx skillsauth add ADu2021/skillXiv discreteness-diffusion-llm

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

When to Use This Skill

Building text-based diffusion models with iterative refinement
Parallel decoding approaches requiring simultaneous multi-token generation
Language generation tasks where intermediate denoising should respect text structure
Research into why continuous diffusion works for images but needs modification for text

When NOT to Use This Skill

Autoregressive sequence-to-sequence tasks (use traditional LLMs instead)
Tasks requiring strict left-to-right dependency (diffusion is inherently non-causal)
Real-time language generation (diffusion is iterative, slow)
Applications where token-level guarantees are critical (legal, medical)

The Fundamental Problem

Diffusion models excel at images because:

Spatial structure is preserved during noise corruption
Neighboring pixels have strong local dependencies
Denoising naturally respects geometric constraints

But text is fundamentally different:

Discrete tokens: Each position is a categorical choice, not continuous value
Long-range dependencies: Word relationships span arbitrary distances
Position information: Tokens have identity (not just position), creating special structure

Standard diffusion directly applied to text creates two critical failures:

Failure 1: Position-Agnostic Corruption

Standard diffusion applies uniform corruption—each position receives the same noise level regardless of linguistic importance.

Original:  "The cat sat on the mat"
Noise level: same for all positions

Problem: Function words (the, on) are noisy same as content (cat, sat)
Better:   Function words are more robust to noise → less important to denoise

Failure 2: Token Dependency Collapse

During parallel decoding, all tokens are generated simultaneously. But tokens aren't independent:

Tokens at position i and i+1 are generated in parallel:
- Position i generates word "read" (ambiguous: past tense? infinitive?)
- Position i+1 needs this information to correctly generate following word
- But they're denoised in parallel, losing this dependency

Five Essential Properties for Text Diffusion

The paper identifies requirements for effective text diffusion systems:

Position-Aware Corruption: Noise should respect linguistic position importance
- Corrupt function words more heavily (they're redundant)
- Corrupt content words lightly (they're essential)
Dependency-Aware Training: Training loss should penalize breaking token relationships
- Include multi-token context in training
- Explicitly learn to maintain semantic/syntactic consistency across denoised tokens
Parallel Consistency: Parallel-denoised tokens must remain mutually consistent
- Require that simultaneously-generated tokens form valid sequences
- Use joint scoring (not independent token scores)
Linguistic Structure Respecting: Denoising should preserve linguistic boundaries
- Respect phrase boundaries, clause structure
- Don't denoise across strong grammatical divisions
Robust Token Boundary Handling: Edge cases at sequence start/end/padding
- Special treatment for START/END tokens
- Masking/padding must not corrupt the denoising process

Comparison of Approaches

The paper categorizes existing methods:

Continuous Embedding-Space Diffusion

Diffuse over token embeddings (not discrete tokens)
Enables smooth denoising but loses discrete structure
Risk: Denoised embeddings may not map to valid tokens

Discrete Token-Based Diffusion

Directly corrupt and denoise token sequences
Preserves discrete structure but harder to train
Risk: Training instability, token dependency violation

Implementation Framework

# Text diffusion system respecting the five properties
class TextDiffusionLM:
    def __init__(self, vocab_size, max_length):
        self.vocab_size = vocab_size
        self.max_length = max_length
        self.position_importance = self.compute_position_importance()  # Property 1

    def compute_position_importance(self):
        """Learn which positions are important to denoise (Property 1)"""
        # Function words less important, content words more important
        importance = torch.ones(self.max_length)
        # Reduce importance of common positions (articles, prepositions)
        importance[position_type=='function_word'] *= 0.3
        return importance

    def corrupt_with_schedule(self, tokens, timestep):
        """Position-aware corruption (Property 1)"""
        noise_level = self.noise_schedule(timestep)
        # Adjust noise by position importance
        effective_noise = noise_level * self.position_importance
        # Apply per-position masking probability
        mask = torch.bernoulli(effective_noise)
        corrupted = tokens.clone()
        corrupted[mask.bool()] = MASK_TOKEN
        return corrupted

    def denoise_with_context(self, corrupted_tokens, context_window=2):
        """Dependency-aware denoising (Property 2)"""
        # For each masked position, gather surrounding context
        denoised = []
        for i in range(len(corrupted_tokens)):
            if corrupted_tokens[i] == MASK_TOKEN:
                # Include multi-token context (Property 2)
                context = corrupted_tokens[max(0, i-context_window):
                                         min(len(corrupted_tokens), i+context_window)]
                pred_token = self.denoise_fn(context, position=i)
                denoised.append(pred_token)
            else:
                denoised.append(corrupted_tokens[i])
        return torch.tensor(denoised)

    def parallel_decode_consistent(self, corrupted_tokens, batch_size=8):
        """Parallel consistency with joint scoring (Property 3)"""
        candidates_per_position = []
        for i in range(len(corrupted_tokens)):
            if corrupted_tokens[i] == MASK_TOKEN:
                # Generate candidate tokens (Property 3: joint scoring)
                candidates = self.get_candidates(corrupted_tokens, i, top_k=batch_size)
                candidates_per_position.append(candidates)

        # Select consistent combination (not independent)
        best_sequence = self.select_joint_best(candidates_per_position)
        return best_sequence

Current Limitations

Today's diffusion LLMs only partially satisfy these five properties:

| Property | Current Status | Gap | |----------|---|---| | Position-Aware Corruption | Rarely implemented | Most use uniform noise | | Dependency-Aware Training | Partially addressed | Single-token training common | | Parallel Consistency | Experimental | Joint decoding uncommon | | Structure Respecting | Not standard | Would add training complexity | | Token Boundary Robustness | Not well-studied | Underexplored area |

When Diffusion for Text Makes Sense

Despite limitations, text diffusion has advantages in:

Iterative refinement workflows where multiple passes improve quality
Parallel generation for latency-critical scenarios (N generation steps >> N parallel tokens)
Non-causal reasoning where all context matters equally (not left-to-right dependency)
Infilling tasks (masked prediction) where diffusion's bidirectional nature excels

Design Guidance

When building a text diffusion system, explicitly address:

How do you handle position-aware corruption? (don't use uniform noise)
How do you encode multi-token dependencies in training? (don't treat tokens independently)
How do you maintain consistency in parallel decoding? (joint scoring, not per-token)
How do you respect linguistic structure? (phrase boundaries, clauses)
How do you handle edge cases? (padding, START/END tokens)

References

Original paper: https://arxiv.org/abs/2512.22630
Related: Diffusion-LM, Editorial, Latent Diffusion (continuous variants)
Implementation challenges: Discrete diffusion frameworks (Absorbing Diffusion, D3PM)

ADu2021/discreteness-diffusion-llm

skills/skillxiv-v0.0.2-claude-opus-4.6/discreteness-diffusion-llm/SKILL.md

Understand fundamental limitations of applying diffusion to discrete text: position-agnostic corruption ignores linguistic structure, and token-wise training misses multi-token dependencies. Design text diffusion systems satisfying five essential properties: position-aware corruption, dependency-aware training, parallel consistency, linguistic structure respecting, and robust handling of token boundaries.

2 stars

testing

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add ADu2021/skillXiv discreteness-diffusion-llm

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 17, 2026, 5:33 AM9.2s1 file scanned

SKILL.md

name:: discreteness-diffusion-llm
title:: On the Role of Discreteness in Diffusion LLMs
version:: 0.0.2
engine:: skillxiv-v0.0.2-claude-opus-4.6
license:: MIT
url:: https://arxiv.org/abs/2512.22630
keywords:: [diffusion models, language generation, discrete text, parallel decoding, token dependencies, LLM architecture]
description:: Understand fundamental limitations of applying diffusion to discrete text: position-agnostic corruption ignores linguistic structure, and token-wise training misses multi-token dependencies. Design text diffusion systems satisfying five essential properties: position-aware corruption, dependency-aware training, parallel consistency, linguistic structure respecting, and robust handling of token boundaries.

When to Use This Skill

Building text-based diffusion models with iterative refinement
Parallel decoding approaches requiring simultaneous multi-token generation
Language generation tasks where intermediate denoising should respect text structure
Research into why continuous diffusion works for images but needs modification for text

When NOT to Use This Skill

Autoregressive sequence-to-sequence tasks (use traditional LLMs instead)
Tasks requiring strict left-to-right dependency (diffusion is inherently non-causal)
Real-time language generation (diffusion is iterative, slow)
Applications where token-level guarantees are critical (legal, medical)

The Fundamental Problem

Diffusion models excel at images because:

Spatial structure is preserved during noise corruption
Neighboring pixels have strong local dependencies
Denoising naturally respects geometric constraints

But text is fundamentally different:

Discrete tokens: Each position is a categorical choice, not continuous value
Long-range dependencies: Word relationships span arbitrary distances
Position information: Tokens have identity (not just position), creating special structure

Standard diffusion directly applied to text creates two critical failures:

Failure 1: Position-Agnostic Corruption

Standard diffusion applies uniform corruption—each position receives the same noise level regardless of linguistic importance.

Original:  "The cat sat on the mat"
Noise level: same for all positions

Problem: Function words (the, on) are noisy same as content (cat, sat)
Better:   Function words are more robust to noise → less important to denoise

Failure 2: Token Dependency Collapse

During parallel decoding, all tokens are generated simultaneously. But tokens aren't independent:

Tokens at position i and i+1 are generated in parallel:
- Position i generates word "read" (ambiguous: past tense? infinitive?)
- Position i+1 needs this information to correctly generate following word
- But they're denoised in parallel, losing this dependency

Five Essential Properties for Text Diffusion

The paper identifies requirements for effective text diffusion systems:

Position-Aware Corruption: Noise should respect linguistic position importance
- Corrupt function words more heavily (they're redundant)
- Corrupt content words lightly (they're essential)
Dependency-Aware Training: Training loss should penalize breaking token relationships
- Include multi-token context in training
- Explicitly learn to maintain semantic/syntactic consistency across denoised tokens
Parallel Consistency: Parallel-denoised tokens must remain mutually consistent
- Require that simultaneously-generated tokens form valid sequences
- Use joint scoring (not independent token scores)
Linguistic Structure Respecting: Denoising should preserve linguistic boundaries
- Respect phrase boundaries, clause structure
- Don't denoise across strong grammatical divisions
Robust Token Boundary Handling: Edge cases at sequence start/end/padding
- Special treatment for START/END tokens
- Masking/padding must not corrupt the denoising process

Comparison of Approaches

The paper categorizes existing methods:

Continuous Embedding-Space Diffusion

Diffuse over token embeddings (not discrete tokens)
Enables smooth denoising but loses discrete structure
Risk: Denoised embeddings may not map to valid tokens

Discrete Token-Based Diffusion

Directly corrupt and denoise token sequences
Preserves discrete structure but harder to train
Risk: Training instability, token dependency violation

Implementation Framework

# Text diffusion system respecting the five properties
class TextDiffusionLM:
    def __init__(self, vocab_size, max_length):
        self.vocab_size = vocab_size
        self.max_length = max_length
        self.position_importance = self.compute_position_importance()  # Property 1

    def compute_position_importance(self):
        """Learn which positions are important to denoise (Property 1)"""
        # Function words less important, content words more important
        importance = torch.ones(self.max_length)
        # Reduce importance of common positions (articles, prepositions)
        importance[position_type=='function_word'] *= 0.3
        return importance

    def corrupt_with_schedule(self, tokens, timestep):
        """Position-aware corruption (Property 1)"""
        noise_level = self.noise_schedule(timestep)
        # Adjust noise by position importance
        effective_noise = noise_level * self.position_importance
        # Apply per-position masking probability
        mask = torch.bernoulli(effective_noise)
        corrupted = tokens.clone()
        corrupted[mask.bool()] = MASK_TOKEN
        return corrupted

    def denoise_with_context(self, corrupted_tokens, context_window=2):
        """Dependency-aware denoising (Property 2)"""
        # For each masked position, gather surrounding context
        denoised = []
        for i in range(len(corrupted_tokens)):
            if corrupted_tokens[i] == MASK_TOKEN:
                # Include multi-token context (Property 2)
                context = corrupted_tokens[max(0, i-context_window):
                                         min(len(corrupted_tokens), i+context_window)]
                pred_token = self.denoise_fn(context, position=i)
                denoised.append(pred_token)
            else:
                denoised.append(corrupted_tokens[i])
        return torch.tensor(denoised)

    def parallel_decode_consistent(self, corrupted_tokens, batch_size=8):
        """Parallel consistency with joint scoring (Property 3)"""
        candidates_per_position = []
        for i in range(len(corrupted_tokens)):
            if corrupted_tokens[i] == MASK_TOKEN:
                # Generate candidate tokens (Property 3: joint scoring)
                candidates = self.get_candidates(corrupted_tokens, i, top_k=batch_size)
                candidates_per_position.append(candidates)

        # Select consistent combination (not independent)
        best_sequence = self.select_joint_best(candidates_per_position)
        return best_sequence

Current Limitations

Today's diffusion LLMs only partially satisfy these five properties:

When Diffusion for Text Makes Sense

Despite limitations, text diffusion has advantages in:

Iterative refinement workflows where multiple passes improve quality
Parallel generation for latency-critical scenarios (N generation steps >> N parallel tokens)
Non-causal reasoning where all context matters equally (not left-to-right dependency)
Infilling tasks (masked prediction) where diffusion's bidirectional nature excels

Design Guidance

When building a text diffusion system, explicitly address:

How do you handle position-aware corruption? (don't use uniform noise)
How do you encode multi-token dependencies in training? (don't treat tokens independently)
How do you maintain consistency in parallel decoding? (joint scoring, not per-token)
How do you respect linguistic structure? (phrase boundaries, clauses)
How do you handle edge cases? (padding, START/END tokens)

References

Original paper: https://arxiv.org/abs/2512.22630
Related: Diffusion-LM, Editorial, Latent Diffusion (continuous variants)
Implementation challenges: Discrete diffusion frameworks (Absorbing Diffusion, D3PM)

Related Skills

ADu2021/flow-map-trajectory-tilting

testing

VerifiedTrustedCommunity

Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flow-map-trajectory-tilting

ADu2021/flexible-data-mixture-of-experts

testing

VerifiedTrustedCommunity

Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexible-data-mixture-of-experts

ADu2021/flexibility-trap-diffusion-reasoning

data-ai

VerifiedTrustedCommunity

Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flexibility-trap-diffusion-reasoning

ADu2021/flex-continuous-agent-evolution

devops

VerifiedTrustedCommunity

Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.

2SKILL.mdUpdated Apr 17, 2026

ADu2021/flex-continuous-agent-evolution

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/ADu2021/skillXiv.git

# Copy into Claude Code skills folder (global)
cp -r skillXiv/skills/skillxiv-v0.0.2-claude-opus-4.6/discreteness-diffusion-llm ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

ADu2021/skillXiv

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT