skills/skillxiv-v0.0.2-claude-opus-4.6/distribution-matching-vae/SKILL.md
Align latent distributions with arbitrary reference distributions via explicit matching constraints rather than fixed priors. DMVAE achieves gFID 3.2 on ImageNet with 64 epochs—when you need flexibility in latent representation design for image generation.
npx skillsauth add ADu2021/skillXiv distribution-matching-vaeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
DMVAE generalizes beyond conventional Gaussian priors by explicitly aligning the encoder's latent distribution with an arbitrary reference distribution. The framework enables matching with distributions derived from self-supervised learning, diffusion noise, or other priors—moving beyond rigid architectural constraints.
Explicit distribution matching for flexible latent space design:
# Distribution Matching VAE
class DistributionMatchingVAE:
def __init__(self, encoder, decoder, reference_distribution=None):
self.encoder = encoder
self.decoder = decoder
self.reference_dist = reference_distribution or self.default_gaussian()
def forward(self, x):
"""
Encode to latent space and reconstruct.
Explicitly match latent distribution to reference.
"""
# Encode
latent, encoder_params = self.encoder(x)
# Reconstruction
recon = self.decoder(latent)
return recon, latent, encoder_params
def compute_distribution_matching_loss(self, latent_samples):
"""
Explicitly align encoder latent distribution with reference.
This replaces implicit KL divergence from fixed Gaussian prior.
"""
# Measure distribution discrepancy
# e.g., Wasserstein distance, Maximum Mean Discrepancy, etc.
matching_loss = self.compute_distribution_distance(
latent_samples,
self.reference_dist
)
return matching_loss
def train_step(self, batch):
"""
DMVAE training combines reconstruction with distribution matching.
"""
x = batch
# Forward pass
recon, latent, encoder_params = self.forward(x)
# Reconstruction loss (pixel-space quality)
recon_loss = torch.nn.functional.mse_loss(recon, x)
# Distribution matching loss (latent structure)
dist_matching_loss = self.compute_distribution_matching_loss(latent)
# Combined objective
total_loss = recon_loss + self.beta * dist_matching_loss
return total_loss
def use_ssl_derived_distribution(self, ssl_model, dataset):
"""
Match encoder to self-supervised learning distribution.
SSL distributions balance reconstruction fidelity and efficiency.
"""
# Extract SSL features as reference distribution
ssl_features = []
for batch in dataset:
features = ssl_model.encode(batch)
ssl_features.append(features)
ssl_features = torch.cat(ssl_features, dim=0)
# Fit reference distribution to SSL features
# (e.g., mixture of Gaussians, normalizing flow, etc.)
self.reference_dist = self.fit_distribution_to_features(
ssl_features
)
return self.reference_dist
def use_diffusion_noise_distribution(self, diffusion_model):
"""
Match encoder to diffusion model noise distribution.
Enables alignment with denoising training paradigms.
"""
# Diffusion models implicitly define noise distributions
# at different timesteps
noise_samples = diffusion_model.sample_noise_distribution()
self.reference_dist = self.fit_distribution_to_features(
noise_samples
)
return self.reference_dist
def compute_distribution_distance(self, samples, reference):
"""
Measure discrepancy between sample distribution and reference.
Supports multiple distance metrics for flexibility.
"""
if self.metric == 'wasserstein':
# Wasserstein distance via optimal transport
distance = self.wasserstein_distance(samples, reference)
elif self.metric == 'mmd':
# Maximum Mean Discrepancy
distance = self.maximum_mean_discrepancy(samples, reference)
elif self.metric == 'ks':
# Kolmogorov-Smirnov test
distance = self.ks_distance(samples, reference)
else:
raise ValueError(f"Unknown metric: {self.metric}")
return distance
The framework systematically investigates which latent distributions are optimal for image synthesis. SSL-derived distributions provide superior performance for bridging high-fidelity synthesis with computational efficiency (gFID 3.2).
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.