skills/skillxiv-v0.0.2-claude-opus-4.6/depth-anything-any-condition/SKILL.md
Extend monocular depth models to handle adverse conditions (weather, darkness, sensor noise) using only 540K training samples. Applies perturbation-based consistency learning and spatial constraints to maintain robust depth prediction across challenging real-world scenarios.
npx skillsauth add ADu2021/skillXiv depth-anything-any-conditionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Foundation depth models excel at clear conditions but fail catastrophically in rain, snow, darkness, and sensor distortion. Yet retraining on diverse adverse conditions requires massive datasets. DepthAnything-AC solves this with perturbation-based consistency learning—training on minimal augmented data from general scenes to make predictions robust to corruption. The method requires just 540K samples (vs 63M for Depth Anything V2) while improving performance on adverse weather benchmarks.
The core insight: consistency under perturbation teaches robustness without domain-specific training data. If a model predicts the same depth for a clear image and its corrupted version, it's capturing genuine geometry rather than texture artifacts. Combined with spatial constraints that enforce geometric coherence, this enables depth estimation to work under rain, snow, fog, darkness, and sensor artifacts.
Depth estimation under adverse conditions requires learning what changes (surface appearance) and what doesn't (underlying geometry). DepthAnything-AC teaches this distinction through:
This approach requires no domain-specific training data or fine-tuning per condition—pure generalization through consistency learning.
The DepthAnything-AC system consists of these components:
This section demonstrates how to implement robust depth estimation under adverse conditions.
Step 1: Create perturbation augmentation for synthetic corruptions
This code applies realistic adverse-condition corruptions to training images:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import numpy as np
from PIL import Image, ImageEnhance, ImageFilter
class PerturbationAugmentation:
"""
Apply synthetic perturbations simulating adverse conditions without labeled data.
Uses unsupervised augmentation: darkness, blur, weather, contrast distortion, sensor noise.
"""
def __init__(self):
self.perturbation_types = ['darkness', 'blur', 'rain', 'snow', 'fog', 'contrast', 'noise']
def apply_darkness(self, image: Image.Image, darkness_level: float = 0.5) -> Image.Image:
"""
Simulate low-light conditions by reducing brightness.
darkness_level: 0 (original) to 1 (completely dark)
"""
enhancer = ImageEnhance.Brightness(image)
darkened = enhancer.enhance(1.0 - darkness_level)
return darkened
def apply_blur(self, image: Image.Image, blur_radius: int = 3) -> Image.Image:
"""Simulate motion blur or focus loss."""
return image.filter(ImageFilter.GaussianBlur(radius=blur_radius))
def apply_weather(self, image: torch.Tensor, weather_type: str = 'rain') -> torch.Tensor:
"""
Simulate weather effects (rain, snow, fog).
Input: tensor (C, H, W) with values in [0, 1]
"""
H, W = image.shape[-2:]
if weather_type == 'rain':
# Add rain streaks
rain_mask = torch.rand(H, W) > 0.95
rain_lines = torch.cumsum(rain_mask.float(), dim=0)
weather_effect = rain_lines.unsqueeze(0) * 0.1
elif weather_type == 'snow':
# Add snow particles
snow_mask = torch.rand(H, W) > 0.98
snow_particles = torch.ones_like(image) * snow_mask.unsqueeze(0)
weather_effect = snow_particles * 0.15
elif weather_type == 'fog':
# Add fog (reduce contrast, lighten)
fog_factor = torch.rand(1).item() * 0.3
weather_effect = torch.ones_like(image) * fog_factor
else:
weather_effect = torch.zeros_like(image)
# Apply weather
corrupted = image + weather_effect
return torch.clamp(corrupted, 0, 1)
def apply_contrast_distortion(self, image: Image.Image, contrast_factor: float = 0.6) -> Image.Image:
"""Simulate sensor contrast distortion or extreme lighting."""
enhancer = ImageEnhance.Contrast(image)
return enhancer.enhance(contrast_factor)
def apply_sensor_noise(self, image: torch.Tensor, noise_std: float = 0.05) -> torch.Tensor:
"""Add sensor noise (Gaussian)."""
noise = torch.randn_like(image) * noise_std
noisy = image + noise
return torch.clamp(noisy, 0, 1)
def augment_image_pair(self, image_tensor: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
"""
Create a pair: clean and corrupted version.
Returns: (clean_image, corrupted_image) for consistency learning.
"""
clean = image_tensor.clone()
# Apply 1-3 random perturbations to create corrupted version
num_perturbations = np.random.randint(1, 3)
perturbations = np.random.choice(self.perturbation_types, num_perturbations, replace=False)
corrupted = clean.clone()
for perturbation in perturbations:
if perturbation == 'darkness':
# Convert to PIL, apply, convert back
pil_img = transforms.ToPILImage()(corrupted.cpu())
pil_img = self.apply_darkness(pil_img, darkness_level=0.3)
corrupted = transforms.ToTensor()(pil_img)
elif perturbation == 'blur':
pil_img = transforms.ToPILImage()(corrupted.cpu())
pil_img = self.apply_blur(pil_img, blur_radius=2)
corrupted = transforms.ToTensor()(pil_img)
elif perturbation == 'contrast':
pil_img = transforms.ToPILImage()(corrupted.cpu())
pil_img = self.apply_contrast_distortion(pil_img, contrast_factor=0.5)
corrupted = transforms.ToTensor()(pil_img)
elif perturbation == 'noise':
corrupted = self.apply_sensor_noise(corrupted, noise_std=0.05)
else: # weather
corrupted = self.apply_weather(corrupted, weather_type='rain')
return clean, corrupted
# Test augmentation
aug = PerturbationAugmentation()
image = torch.rand(3, 256, 256)
clean, corrupted = aug.augment_image_pair(image)
print(f"Clean shape: {clean.shape}, Corrupted shape: {corrupted.shape}")
print(f"Difference range: {(clean - corrupted).min():.3f} to {(clean - corrupted).max():.3f}")
This creates training pairs for consistency learning without labeled adverse-condition data.
Step 2: Implement perturbation-based consistency loss
This code ensures stable depth predictions under augmentation:
import torch.nn.functional as F
class PerturbationConsistencyLoss(nn.Module):
"""
Teach models to predict consistent depth despite image corruption.
Consistency = robustness to perturbations.
"""
def __init__(self):
super().__init__()
def affine_invariant_depth_loss(
self,
depth_clean: torch.Tensor,
depth_corrupted: torch.Tensor
) -> torch.Tensor:
"""
Compute affine-invariant loss: robust to lighting/scale changes.
Formulation: minimize difference after affine alignment.
"""
# Flatten for computation
d_clean = depth_clean.view(-1)
d_corrupt = depth_corrupted.view(-1)
# Compute affine parameters that best align corrupted to clean
# Solve: d_corrupt_aligned = a * d_corrupt + b
mean_d_clean = d_clean.mean()
mean_d_corrupt = d_corrupt.mean()
# Least squares estimate of scaling and offset
numerator = ((d_clean - mean_d_clean) * (d_corrupt - mean_d_corrupt)).sum()
denominator = ((d_corrupt - mean_d_corrupt) ** 2).sum()
scale = numerator / (denominator + 1e-8)
offset = mean_d_clean - scale * mean_d_corrupt
# Aligned corrupted depth
d_corrupt_aligned = scale * d_corrupt + offset
# Compute consistency loss
loss = F.l1_loss(d_clean, d_corrupt_aligned)
return loss
def forward(
self,
depth_predictions_clean: torch.Tensor,
depth_predictions_corrupted: torch.Tensor
) -> torch.Tensor:
"""
Main consistency loss: predictions should match despite corruption.
"""
consistency_loss = self.affine_invariant_depth_loss(
depth_predictions_clean,
depth_predictions_corrupted
)
return consistency_loss
# Test consistency loss
consistency_fn = PerturbationConsistencyLoss()
depth_clean = torch.rand(2, 1, 256, 256) * 10 # Depth 0-10 meters
depth_corrupt = depth_clean + torch.randn_like(depth_clean) * 0.5 # Add noise
loss = consistency_fn(depth_clean, depth_corrupt)
print(f"Consistency loss: {loss.item():.4f}")
This loss ensures depth predictions remain stable under image perturbations.
Step 3: Add spatial distance constraints
This code enforces geometric consistency between neighboring patches:
class SpatialDistanceConstraint(nn.Module):
"""
Enforce spatial consistency: nearby patches should have correlated depth changes.
Captures geometric relationships to improve robustness.
"""
def __init__(self, patch_size=16):
super().__init__()
self.patch_size = patch_size
def extract_patch_features(self, image: torch.Tensor) -> torch.Tensor:
"""
Extract features from non-overlapping patches.
Returns: patch embeddings and positions.
"""
B, C, H, W = image.shape
p = self.patch_size
# Reshape into patches
patches = image.reshape(
B,
C,
H // p,
p,
W // p,
p
).permute(0, 2, 4, 1, 3, 5).reshape(
B,
(H // p) * (W // p),
C * p * p
)
return patches # (B, num_patches, C*p*p)
def compute_spatial_distance_matrix(
self,
image: torch.Tensor
) -> torch.Tensor:
"""
Compute spatial distance relationships between patches.
Distance = function of spatial proximity and feature similarity.
"""
patches = self.extract_patch_features(image)
B, num_patches, feature_dim = patches.shape
# Compute pairwise similarity
patch_similarity = torch.bmm(patches, patches.transpose(1, 2)) # (B, num_patches, num_patches)
patch_similarity = patch_similarity / (feature_dim ** 0.5)
patch_similarity = F.softmax(patch_similarity, dim=-1)
return patch_similarity
def consistency_constraint(
self,
depth_predictions: torch.Tensor,
spatial_distances: torch.Tensor
) -> torch.Tensor:
"""
Enforce: nearby patches (high spatial_distance) should have similar depth gradients.
"""
B, 1, H, W = depth_predictions.shape
p = self.patch_size
# Extract patches from depth
depth_patches = depth_predictions.reshape(
B,
1,
H // p,
p,
W // p,
p
).permute(0, 2, 4, 1, 3, 5).reshape(
B,
(H // p) * (W // p),
p * p
)
# Compute depth gradient within each patch
depth_gradients = depth_patches.std(dim=-1, keepdim=True) # (B, num_patches, 1)
# Constraint: high spatial similarity should correlate with similar depth gradients
num_patches = spatial_distances.shape[1]
constraint_loss = 0
for i in range(num_patches):
for j in range(num_patches):
spatial_sim = spatial_distances[:, i, j]
depth_grad_diff = (depth_gradients[:, i] - depth_gradients[:, j]).abs().squeeze(-1)
# Loss: high spatial similarity should correspond to low gradient difference
constraint_loss += spatial_sim * depth_grad_diff
return constraint_loss.mean()
# Test spatial constraints
spatial_constraint = SpatialDistanceConstraint(patch_size=16)
image = torch.rand(2, 3, 256, 256)
depth = torch.rand(2, 1, 256, 256) * 10
spatial_dists = spatial_constraint.compute_spatial_distance_matrix(image)
constraint_loss = spatial_constraint.consistency_constraint(depth, spatial_dists)
print(f"Spatial distance matrix shape: {spatial_dists.shape}")
print(f"Spatial constraint loss: {constraint_loss.item():.4f}")
This enforces geometric coherence in depth predictions.
Step 4: Train robust depth model with combined losses
This code combines consistency and spatial losses for robust depth estimation:
class RobustDepthModel(nn.Module):
"""
Complete robust depth model combining consistency and spatial constraints.
"""
def __init__(self, backbone='dpt'):
super().__init__()
# Load pretrained depth model (Depth Anything V2 backbone)
self.depth_encoder = load_pretrained_depth_model(backbone)
self.consistency_loss_fn = PerturbationConsistencyLoss()
self.spatial_constraint_fn = SpatialDistanceConstraint()
def forward(self, image: torch.Tensor) -> torch.Tensor:
"""Predict depth from image."""
depth = self.depth_encoder(image)
return depth
def train_robust_depth(
model,
train_loader,
num_epochs=50,
device='cuda'
):
"""
Train depth model with consistency and spatial constraints.
"""
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
augmentation = PerturbationAugmentation()
for epoch in range(num_epochs):
total_loss = 0
for batch in train_loader:
images = batch['image'].to(device) # (B, 3, H, W)
# Create clean and corrupted pairs
clean_depths = model(images)
# Augment images
corrupted_images = torch.stack([
augmentation.augment_image_pair(img)[1] for img in images
]).to(device)
corrupted_depths = model(corrupted_images)
# Consistency loss: depths should match despite corruption
consistency_loss = model.consistency_loss_fn(clean_depths, corrupted_depths)
# Spatial constraint loss: nearby patches should be coherent
spatial_dists = model.spatial_constraint_fn.compute_spatial_distance_matrix(images)
spatial_loss = model.spatial_constraint_fn.consistency_constraint(
clean_depths,
spatial_dists
)
# Combined loss
total = consistency_loss + 0.1 * spatial_loss
optimizer.zero_grad()
total.backward()
optimizer.step()
total_loss += total.item()
avg_loss = total_loss / len(train_loader)
print(f"Epoch {epoch}: Loss {avg_loss:.4f}")
return model
# Train robust model (simplified)
model = RobustDepthModel()
# model = train_robust_depth(model, train_loader)
print("Model trained with perturbation consistency!")
This trains depth models robust to adverse conditions without domain-specific labeled data.
Step 5: Evaluate on adverse conditions
This code tests robustness across benchmarks:
def evaluate_robustness(model, test_benchmarks=['da2k', 'clear', 'weather'], device='cuda'):
"""
Evaluate depth estimation across clear and adverse conditions.
"""
results = {}
# Standard clear condition benchmark
if 'clear' in test_benchmarks:
clear_data = load_dataset("nyu_depth_v2")
clear_accuracy = evaluate_depth_accuracy(model, clear_data, device)
results['clear'] = clear_accuracy
print(f"Clear conditions accuracy: {clear_accuracy:.3f}")
# DA-2K: adverse weather benchmark
if 'da2k' in test_benchmarks:
da2k_data = load_dataset("da2k", split="test") # Weather adverse conditions
weather_accuracy = evaluate_depth_accuracy(model, da2k_data, device)
results['adverse_weather'] = weather_accuracy
print(f"Adverse weather accuracy: {weather_accuracy:.3f}")
# Synthetic perturbations
if 'weather' in test_benchmarks:
augmentation = PerturbationAugmentation()
synthetic_accuracy = []
for perturbation_type in ['darkness', 'rain', 'snow', 'fog', 'noise']:
# Apply perturbation and evaluate
# Simplified; real evaluation compares to ground truth
accuracy = np.random.rand() # Placeholder
synthetic_accuracy.append(accuracy)
results['synthetic_perturbations'] = np.mean(synthetic_accuracy)
print(f"Synthetic perturbation robustness: {np.mean(synthetic_accuracy):.3f}")
return results
def evaluate_depth_accuracy(model, dataset, device, metric='mse'):
"""Helper: compute depth accuracy on a dataset."""
model.eval()
total_error = 0
with torch.no_grad():
for sample in dataset:
image = sample['image'].to(device)
ground_truth = sample['depth'].to(device)
prediction = model(image)
if metric == 'mse':
error = F.mse_loss(prediction, ground_truth)
else:
error = F.l1_loss(prediction, ground_truth)
total_error += error.item()
return 1.0 / (1.0 + total_error / len(dataset)) # Convert to accuracy
# Evaluate robustness
results = evaluate_robustness(model)
This evaluates robustness across diverse challenging conditions.
When to use DepthAnything-AC:
When NOT to use:
Hyperparameters and Configuration:
| Parameter | Recommended | Notes | |-----------|-------------|-------| | Perturbation Types | 3-4 combined | Mix darkness, blur, weather, noise for diversity | | Darkness Level | 0.3-0.5 | Moderate reduction; too extreme becomes uninformative | | Blur Radius | 2-3 pixels | Simulates focus loss and motion blur | | Noise Std | 0.05 | 5% of signal range for sensor noise | | Spatial Patch Size | 16 | Balance between local and global constraints | | Consistency Weight | 1.0 | Consistency is primary loss | | Spatial Loss Weight | 0.1 | Spatial constraint is secondary | | Learning Rate | 1e-4 | Standard for foundation model fine-tuning | | Training Epochs | 50-100 | Diminishing returns after 50 |
Common Pitfalls:
Key Design Decisions: DepthAnything-AC uses unsupervised augmentation (no labeled adverse-condition data needed) combined with consistency learning—if predictions match despite corruption, the model learned genuine geometry. Affine-invariant loss handles lighting and scale variations inherent to adversarial images. Spatial constraints enforce geometric coherence without requiring labeled relationships. The method requires only 540K samples because it leverages unlabeled general-scene data with synthetic corruption rather than domain-specific labels.
Li, Z., Wang, Z., Wang, X., Cao, Y., & Wang, Y. (2025). Depth Anything at Any Condition. arXiv preprint arXiv:2507.01634. https://arxiv.org/abs/2507.01634
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.