Depth Anything at Any Condition: Robust Depth Under Adversity

Foundation depth models excel at clear conditions but fail catastrophically in rain, snow, darkness, and sensor distortion. Yet retraining on diverse adverse conditions requires massive datasets. DepthAnything-AC solves this with perturbation-based consistency learning—training on minimal augmented data from general scenes to make predictions robust to corruption. The method requires just 540K samples (vs 63M for Depth Anything V2) while improving performance on adverse weather benchmarks.

The core insight: consistency under perturbation teaches robustness without domain-specific training data. If a model predicts the same depth for a clear image and its corrupted version, it's capturing genuine geometry rather than texture artifacts. Combined with spatial constraints that enforce geometric coherence, this enables depth estimation to work under rain, snow, fog, darkness, and sensor artifacts.

Core Concept

Depth estimation under adverse conditions requires learning what changes (surface appearance) and what doesn't (underlying geometry). DepthAnything-AC teaches this distinction through:

Perturbation-Based Consistency: Train using pairs of clean and corrupted images, learning to produce consistent depth predictions despite visual changes
Unsupervised Augmentation: Apply synthetic corruptions (darkness, blur, weather effects, contrast changes) without requiring labeled diverse data
Spatial Distance Constraints: Enforce geometric relationships between image patches—if patches are spatially close, their depth should be consistent
Affine-Invariant Loss: Formulation that's robust to global illumination changes, scaling depth predictions to absolute values

This approach requires no domain-specific training data or fine-tuning per condition—pure generalization through consistency learning.

Architecture Overview

The DepthAnything-AC system consists of these components:

Perturbation Generator: Synthetic corruption module applying weather, darkness, blur, contrast, and noise
Base Depth Model: Foundation model (Depth Anything V2 backbone) adapted for robustness
Consistency Regularization Module: Loss function ensuring stable predictions across perturbations
Spatial Distance Constraint Engine: Extracts geometric relationships between patches
Affine-Invariant Depth Loss: Formulation robust to lighting and scale variations
Evaluation Benchmarks: DA-2K benchmark for adverse weather; standard datasets for general performance
Inference Pipeline: Single forward pass; no per-image test-time adaptation

Implementation

This section demonstrates how to implement robust depth estimation under adverse conditions.

Step 1: Create perturbation augmentation for synthetic corruptions

This code applies realistic adverse-condition corruptions to training images:

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import numpy as np
from PIL import Image, ImageEnhance, ImageFilter

class PerturbationAugmentation:
    """
    Apply synthetic perturbations simulating adverse conditions without labeled data.
    Uses unsupervised augmentation: darkness, blur, weather, contrast distortion, sensor noise.
    """

    def __init__(self):
        self.perturbation_types = ['darkness', 'blur', 'rain', 'snow', 'fog', 'contrast', 'noise']

    def apply_darkness(self, image: Image.Image, darkness_level: float = 0.5) -> Image.Image:
        """
        Simulate low-light conditions by reducing brightness.
        darkness_level: 0 (original) to 1 (completely dark)
        """
        enhancer = ImageEnhance.Brightness(image)
        darkened = enhancer.enhance(1.0 - darkness_level)
        return darkened

    def apply_blur(self, image: Image.Image, blur_radius: int = 3) -> Image.Image:
        """Simulate motion blur or focus loss."""
        return image.filter(ImageFilter.GaussianBlur(radius=blur_radius))

    def apply_weather(self, image: torch.Tensor, weather_type: str = 'rain') -> torch.Tensor:
        """
        Simulate weather effects (rain, snow, fog).
        Input: tensor (C, H, W) with values in [0, 1]
        """
        H, W = image.shape[-2:]

        if weather_type == 'rain':
            # Add rain streaks
            rain_mask = torch.rand(H, W) > 0.95
            rain_lines = torch.cumsum(rain_mask.float(), dim=0)
            weather_effect = rain_lines.unsqueeze(0) * 0.1
        elif weather_type == 'snow':
            # Add snow particles
            snow_mask = torch.rand(H, W) > 0.98
            snow_particles = torch.ones_like(image) * snow_mask.unsqueeze(0)
            weather_effect = snow_particles * 0.15
        elif weather_type == 'fog':
            # Add fog (reduce contrast, lighten)
            fog_factor = torch.rand(1).item() * 0.3
            weather_effect = torch.ones_like(image) * fog_factor
        else:
            weather_effect = torch.zeros_like(image)

        # Apply weather
        corrupted = image + weather_effect
        return torch.clamp(corrupted, 0, 1)

    def apply_contrast_distortion(self, image: Image.Image, contrast_factor: float = 0.6) -> Image.Image:
        """Simulate sensor contrast distortion or extreme lighting."""
        enhancer = ImageEnhance.Contrast(image)
        return enhancer.enhance(contrast_factor)

    def apply_sensor_noise(self, image: torch.Tensor, noise_std: float = 0.05) -> torch.Tensor:
        """Add sensor noise (Gaussian)."""
        noise = torch.randn_like(image) * noise_std
        noisy = image + noise
        return torch.clamp(noisy, 0, 1)

    def augment_image_pair(self, image_tensor: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Create a pair: clean and corrupted version.
        Returns: (clean_image, corrupted_image) for consistency learning.
        """
        clean = image_tensor.clone()

        # Apply 1-3 random perturbations to create corrupted version
        num_perturbations = np.random.randint(1, 3)
        perturbations = np.random.choice(self.perturbation_types, num_perturbations, replace=False)

        corrupted = clean.clone()

        for perturbation in perturbations:
            if perturbation == 'darkness':
                # Convert to PIL, apply, convert back
                pil_img = transforms.ToPILImage()(corrupted.cpu())
                pil_img = self.apply_darkness(pil_img, darkness_level=0.3)
                corrupted = transforms.ToTensor()(pil_img)
            elif perturbation == 'blur':
                pil_img = transforms.ToPILImage()(corrupted.cpu())
                pil_img = self.apply_blur(pil_img, blur_radius=2)
                corrupted = transforms.ToTensor()(pil_img)
            elif perturbation == 'contrast':
                pil_img = transforms.ToPILImage()(corrupted.cpu())
                pil_img = self.apply_contrast_distortion(pil_img, contrast_factor=0.5)
                corrupted = transforms.ToTensor()(pil_img)
            elif perturbation == 'noise':
                corrupted = self.apply_sensor_noise(corrupted, noise_std=0.05)
            else:  # weather
                corrupted = self.apply_weather(corrupted, weather_type='rain')

        return clean, corrupted

# Test augmentation
aug = PerturbationAugmentation()
image = torch.rand(3, 256, 256)

clean, corrupted = aug.augment_image_pair(image)
print(f"Clean shape: {clean.shape}, Corrupted shape: {corrupted.shape}")
print(f"Difference range: {(clean - corrupted).min():.3f} to {(clean - corrupted).max():.3f}")

This creates training pairs for consistency learning without labeled adverse-condition data.

Step 2: Implement perturbation-based consistency loss

This code ensures stable depth predictions under augmentation:

import torch.nn.functional as F

class PerturbationConsistencyLoss(nn.Module):
    """
    Teach models to predict consistent depth despite image corruption.
    Consistency = robustness to perturbations.
    """

    def __init__(self):
        super().__init__()

    def affine_invariant_depth_loss(
        self,
        depth_clean: torch.Tensor,
        depth_corrupted: torch.Tensor
    ) -> torch.Tensor:
        """
        Compute affine-invariant loss: robust to lighting/scale changes.
        Formulation: minimize difference after affine alignment.
        """

        # Flatten for computation
        d_clean = depth_clean.view(-1)
        d_corrupt = depth_corrupted.view(-1)

        # Compute affine parameters that best align corrupted to clean
        # Solve: d_corrupt_aligned = a * d_corrupt + b
        mean_d_clean = d_clean.mean()
        mean_d_corrupt = d_corrupt.mean()

        # Least squares estimate of scaling and offset
        numerator = ((d_clean - mean_d_clean) * (d_corrupt - mean_d_corrupt)).sum()
        denominator = ((d_corrupt - mean_d_corrupt) ** 2).sum()
        scale = numerator / (denominator + 1e-8)
        offset = mean_d_clean - scale * mean_d_corrupt

        # Aligned corrupted depth
        d_corrupt_aligned = scale * d_corrupt + offset

        # Compute consistency loss
        loss = F.l1_loss(d_clean, d_corrupt_aligned)

        return loss

    def forward(
        self,
        depth_predictions_clean: torch.Tensor,
        depth_predictions_corrupted: torch.Tensor
    ) -> torch.Tensor:
        """
        Main consistency loss: predictions should match despite corruption.
        """

        consistency_loss = self.affine_invariant_depth_loss(
            depth_predictions_clean,
            depth_predictions_corrupted
        )

        return consistency_loss

# Test consistency loss
consistency_fn = PerturbationConsistencyLoss()
depth_clean = torch.rand(2, 1, 256, 256) * 10  # Depth 0-10 meters
depth_corrupt = depth_clean + torch.randn_like(depth_clean) * 0.5  # Add noise

loss = consistency_fn(depth_clean, depth_corrupt)
print(f"Consistency loss: {loss.item():.4f}")

This loss ensures depth predictions remain stable under image perturbations.

Step 3: Add spatial distance constraints

This code enforces geometric consistency between neighboring patches:

class SpatialDistanceConstraint(nn.Module):
    """
    Enforce spatial consistency: nearby patches should have correlated depth changes.
    Captures geometric relationships to improve robustness.
    """

    def __init__(self, patch_size=16):
        super().__init__()
        self.patch_size = patch_size

    def extract_patch_features(self, image: torch.Tensor) -> torch.Tensor:
        """
        Extract features from non-overlapping patches.
        Returns: patch embeddings and positions.
        """

        B, C, H, W = image.shape
        p = self.patch_size

        # Reshape into patches
        patches = image.reshape(
            B,
            C,
            H // p,
            p,
            W // p,
            p
        ).permute(0, 2, 4, 1, 3, 5).reshape(
            B,
            (H // p) * (W // p),
            C * p * p
        )

        return patches  # (B, num_patches, C*p*p)

    def compute_spatial_distance_matrix(
        self,
        image: torch.Tensor
    ) -> torch.Tensor:
        """
        Compute spatial distance relationships between patches.
        Distance = function of spatial proximity and feature similarity.
        """

        patches = self.extract_patch_features(image)
        B, num_patches, feature_dim = patches.shape

        # Compute pairwise similarity
        patch_similarity = torch.bmm(patches, patches.transpose(1, 2))  # (B, num_patches, num_patches)
        patch_similarity = patch_similarity / (feature_dim ** 0.5)
        patch_similarity = F.softmax(patch_similarity, dim=-1)

        return patch_similarity

    def consistency_constraint(
        self,
        depth_predictions: torch.Tensor,
        spatial_distances: torch.Tensor
    ) -> torch.Tensor:
        """
        Enforce: nearby patches (high spatial_distance) should have similar depth gradients.
        """

        B, 1, H, W = depth_predictions.shape
        p = self.patch_size

        # Extract patches from depth
        depth_patches = depth_predictions.reshape(
            B,
            1,
            H // p,
            p,
            W // p,
            p
        ).permute(0, 2, 4, 1, 3, 5).reshape(
            B,
            (H // p) * (W // p),
            p * p
        )

        # Compute depth gradient within each patch
        depth_gradients = depth_patches.std(dim=-1, keepdim=True)  # (B, num_patches, 1)

        # Constraint: high spatial similarity should correlate with similar depth gradients
        num_patches = spatial_distances.shape[1]

        constraint_loss = 0
        for i in range(num_patches):
            for j in range(num_patches):
                spatial_sim = spatial_distances[:, i, j]
                depth_grad_diff = (depth_gradients[:, i] - depth_gradients[:, j]).abs().squeeze(-1)

                # Loss: high spatial similarity should correspond to low gradient difference
                constraint_loss += spatial_sim * depth_grad_diff

        return constraint_loss.mean()

# Test spatial constraints
spatial_constraint = SpatialDistanceConstraint(patch_size=16)
image = torch.rand(2, 3, 256, 256)
depth = torch.rand(2, 1, 256, 256) * 10

spatial_dists = spatial_constraint.compute_spatial_distance_matrix(image)
constraint_loss = spatial_constraint.consistency_constraint(depth, spatial_dists)

print(f"Spatial distance matrix shape: {spatial_dists.shape}")
print(f"Spatial constraint loss: {constraint_loss.item():.4f}")

This enforces geometric coherence in depth predictions.

Step 4: Train robust depth model with combined losses

This code combines consistency and spatial losses for robust depth estimation:

class RobustDepthModel(nn.Module):
    """
    Complete robust depth model combining consistency and spatial constraints.
    """

    def __init__(self, backbone='dpt'):
        super().__init__()
        # Load pretrained depth model (Depth Anything V2 backbone)
        self.depth_encoder = load_pretrained_depth_model(backbone)
        self.consistency_loss_fn = PerturbationConsistencyLoss()
        self.spatial_constraint_fn = SpatialDistanceConstraint()

    def forward(self, image: torch.Tensor) -> torch.Tensor:
        """Predict depth from image."""
        depth = self.depth_encoder(image)
        return depth

def train_robust_depth(
    model,
    train_loader,
    num_epochs=50,
    device='cuda'
):
    """
    Train depth model with consistency and spatial constraints.
    """

    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
    augmentation = PerturbationAugmentation()

    for epoch in range(num_epochs):
        total_loss = 0

        for batch in train_loader:
            images = batch['image'].to(device)  # (B, 3, H, W)

            # Create clean and corrupted pairs
            clean_depths = model(images)

            # Augment images
            corrupted_images = torch.stack([
                augmentation.augment_image_pair(img)[1] for img in images
            ]).to(device)

            corrupted_depths = model(corrupted_images)

            # Consistency loss: depths should match despite corruption
            consistency_loss = model.consistency_loss_fn(clean_depths, corrupted_depths)

            # Spatial constraint loss: nearby patches should be coherent
            spatial_dists = model.spatial_constraint_fn.compute_spatial_distance_matrix(images)
            spatial_loss = model.spatial_constraint_fn.consistency_constraint(
                clean_depths,
                spatial_dists
            )

            # Combined loss
            total = consistency_loss + 0.1 * spatial_loss

            optimizer.zero_grad()
            total.backward()
            optimizer.step()

            total_loss += total.item()

        avg_loss = total_loss / len(train_loader)
        print(f"Epoch {epoch}: Loss {avg_loss:.4f}")

    return model

# Train robust model (simplified)
model = RobustDepthModel()
# model = train_robust_depth(model, train_loader)
print("Model trained with perturbation consistency!")

This trains depth models robust to adverse conditions without domain-specific labeled data.

Step 5: Evaluate on adverse conditions

This code tests robustness across benchmarks:

def evaluate_robustness(model, test_benchmarks=['da2k', 'clear', 'weather'], device='cuda'):
    """
    Evaluate depth estimation across clear and adverse conditions.
    """

    results = {}

    # Standard clear condition benchmark
    if 'clear' in test_benchmarks:
        clear_data = load_dataset("nyu_depth_v2")
        clear_accuracy = evaluate_depth_accuracy(model, clear_data, device)
        results['clear'] = clear_accuracy
        print(f"Clear conditions accuracy: {clear_accuracy:.3f}")

    # DA-2K: adverse weather benchmark
    if 'da2k' in test_benchmarks:
        da2k_data = load_dataset("da2k", split="test")  # Weather adverse conditions
        weather_accuracy = evaluate_depth_accuracy(model, da2k_data, device)
        results['adverse_weather'] = weather_accuracy
        print(f"Adverse weather accuracy: {weather_accuracy:.3f}")

    # Synthetic perturbations
    if 'weather' in test_benchmarks:
        augmentation = PerturbationAugmentation()
        synthetic_accuracy = []

        for perturbation_type in ['darkness', 'rain', 'snow', 'fog', 'noise']:
            # Apply perturbation and evaluate
            # Simplified; real evaluation compares to ground truth
            accuracy = np.random.rand()  # Placeholder
            synthetic_accuracy.append(accuracy)

        results['synthetic_perturbations'] = np.mean(synthetic_accuracy)
        print(f"Synthetic perturbation robustness: {np.mean(synthetic_accuracy):.3f}")

    return results

def evaluate_depth_accuracy(model, dataset, device, metric='mse'):
    """Helper: compute depth accuracy on a dataset."""
    model.eval()
    total_error = 0

    with torch.no_grad():
        for sample in dataset:
            image = sample['image'].to(device)
            ground_truth = sample['depth'].to(device)

            prediction = model(image)

            if metric == 'mse':
                error = F.mse_loss(prediction, ground_truth)
            else:
                error = F.l1_loss(prediction, ground_truth)

            total_error += error.item()

    return 1.0 / (1.0 + total_error / len(dataset))  # Convert to accuracy

# Evaluate robustness
results = evaluate_robustness(model)

This evaluates robustness across diverse challenging conditions.

Practical Guidance

When to use DepthAnything-AC:

Depth estimation in uncontrolled outdoor environments (weather, lighting variation)
Mobile/edge robotics requiring robust depth sensing
Autonomous driving systems handling diverse weather
Surveillance and monitoring in adverse lighting
Applications needing single-model deployment (no condition-specific models)

When NOT to use:

Controlled lab environments with consistent lighting and clear images
Applications requiring ultra-high precision (adversarial robustness trades some accuracy)
Real-time systems where augmentation/consistency checks add latency
Extremely low-light scenarios (fundamental information limits remain)
Domains with radically different depth distributions than training

Hyperparameters and Configuration:

| Parameter | Recommended | Notes | |-----------|-------------|-------| | Perturbation Types | 3-4 combined | Mix darkness, blur, weather, noise for diversity | | Darkness Level | 0.3-0.5 | Moderate reduction; too extreme becomes uninformative | | Blur Radius | 2-3 pixels | Simulates focus loss and motion blur | | Noise Std | 0.05 | 5% of signal range for sensor noise | | Spatial Patch Size | 16 | Balance between local and global constraints | | Consistency Weight | 1.0 | Consistency is primary loss | | Spatial Loss Weight | 0.1 | Spatial constraint is secondary | | Learning Rate | 1e-4 | Standard for foundation model fine-tuning | | Training Epochs | 50-100 | Diminishing returns after 50 |

Common Pitfalls:

Over-darkening during augmentation (no information left to learn from)
Applying too many perturbations simultaneously (conflicting corruption signals)
Ignoring base model performance (robustness gains only useful if baseline is good)
Not validating on truly unseen adverse conditions (simulated weather ≠ real weather)
Excessive spatial constraint weighting (suppresses depth detail)
Freezing backbone during training (prevents adaptation to corruption patterns)

Key Design Decisions: DepthAnything-AC uses unsupervised augmentation (no labeled adverse-condition data needed) combined with consistency learning—if predictions match despite corruption, the model learned genuine geometry. Affine-invariant loss handles lighting and scale variations inherent to adversarial images. Spatial constraints enforce geometric coherence without requiring labeled relationships. The method requires only 540K samples because it leverages unlabeled general-scene data with synthetic corruption rather than domain-specific labels.

Reference

Li, Z., Wang, Z., Wang, X., Cao, Y., & Wang, Y. (2025). Depth Anything at Any Condition. arXiv preprint arXiv:2507.01634. https://arxiv.org/abs/2507.01634

Depth Anything at Any Condition: Robust Depth Under Adversity

Core Concept

Depth estimation under adverse conditions requires learning what changes (surface appearance) and what doesn't (underlying geometry). DepthAnything-AC teaches this distinction through:

Perturbation-Based Consistency: Train using pairs of clean and corrupted images, learning to produce consistent depth predictions despite visual changes
Unsupervised Augmentation: Apply synthetic corruptions (darkness, blur, weather effects, contrast changes) without requiring labeled diverse data
Spatial Distance Constraints: Enforce geometric relationships between image patches—if patches are spatially close, their depth should be consistent
Affine-Invariant Loss: Formulation that's robust to global illumination changes, scaling depth predictions to absolute values

This approach requires no domain-specific training data or fine-tuning per condition—pure generalization through consistency learning.

Architecture Overview

The DepthAnything-AC system consists of these components:

Perturbation Generator: Synthetic corruption module applying weather, darkness, blur, contrast, and noise
Base Depth Model: Foundation model (Depth Anything V2 backbone) adapted for robustness
Consistency Regularization Module: Loss function ensuring stable predictions across perturbations
Spatial Distance Constraint Engine: Extracts geometric relationships between patches
Affine-Invariant Depth Loss: Formulation robust to lighting and scale variations
Evaluation Benchmarks: DA-2K benchmark for adverse weather; standard datasets for general performance
Inference Pipeline: Single forward pass; no per-image test-time adaptation

Implementation

This section demonstrates how to implement robust depth estimation under adverse conditions.

Step 1: Create perturbation augmentation for synthetic corruptions

This code applies realistic adverse-condition corruptions to training images:

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import numpy as np
from PIL import Image, ImageEnhance, ImageFilter

class PerturbationAugmentation:
    """
    Apply synthetic perturbations simulating adverse conditions without labeled data.
    Uses unsupervised augmentation: darkness, blur, weather, contrast distortion, sensor noise.
    """

    def __init__(self):
        self.perturbation_types = ['darkness', 'blur', 'rain', 'snow', 'fog', 'contrast', 'noise']

    def apply_darkness(self, image: Image.Image, darkness_level: float = 0.5) -> Image.Image:
        """
        Simulate low-light conditions by reducing brightness.
        darkness_level: 0 (original) to 1 (completely dark)
        """
        enhancer = ImageEnhance.Brightness(image)
        darkened = enhancer.enhance(1.0 - darkness_level)
        return darkened

    def apply_blur(self, image: Image.Image, blur_radius: int = 3) -> Image.Image:
        """Simulate motion blur or focus loss."""
        return image.filter(ImageFilter.GaussianBlur(radius=blur_radius))

    def apply_weather(self, image: torch.Tensor, weather_type: str = 'rain') -> torch.Tensor:
        """
        Simulate weather effects (rain, snow, fog).
        Input: tensor (C, H, W) with values in [0, 1]
        """
        H, W = image.shape[-2:]

        if weather_type == 'rain':
            # Add rain streaks
            rain_mask = torch.rand(H, W) > 0.95
            rain_lines = torch.cumsum(rain_mask.float(), dim=0)
            weather_effect = rain_lines.unsqueeze(0) * 0.1
        elif weather_type == 'snow':
            # Add snow particles
            snow_mask = torch.rand(H, W) > 0.98
            snow_particles = torch.ones_like(image) * snow_mask.unsqueeze(0)
            weather_effect = snow_particles * 0.15
        elif weather_type == 'fog':
            # Add fog (reduce contrast, lighten)
            fog_factor = torch.rand(1).item() * 0.3
            weather_effect = torch.ones_like(image) * fog_factor
        else:
            weather_effect = torch.zeros_like(image)

        # Apply weather
        corrupted = image + weather_effect
        return torch.clamp(corrupted, 0, 1)

    def apply_contrast_distortion(self, image: Image.Image, contrast_factor: float = 0.6) -> Image.Image:
        """Simulate sensor contrast distortion or extreme lighting."""
        enhancer = ImageEnhance.Contrast(image)
        return enhancer.enhance(contrast_factor)

    def apply_sensor_noise(self, image: torch.Tensor, noise_std: float = 0.05) -> torch.Tensor:
        """Add sensor noise (Gaussian)."""
        noise = torch.randn_like(image) * noise_std
        noisy = image + noise
        return torch.clamp(noisy, 0, 1)

    def augment_image_pair(self, image_tensor: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Create a pair: clean and corrupted version.
        Returns: (clean_image, corrupted_image) for consistency learning.
        """
        clean = image_tensor.clone()

        # Apply 1-3 random perturbations to create corrupted version
        num_perturbations = np.random.randint(1, 3)
        perturbations = np.random.choice(self.perturbation_types, num_perturbations, replace=False)

        corrupted = clean.clone()

        for perturbation in perturbations:
            if perturbation == 'darkness':
                # Convert to PIL, apply, convert back
                pil_img = transforms.ToPILImage()(corrupted.cpu())
                pil_img = self.apply_darkness(pil_img, darkness_level=0.3)
                corrupted = transforms.ToTensor()(pil_img)
            elif perturbation == 'blur':
                pil_img = transforms.ToPILImage()(corrupted.cpu())
                pil_img = self.apply_blur(pil_img, blur_radius=2)
                corrupted = transforms.ToTensor()(pil_img)
            elif perturbation == 'contrast':
                pil_img = transforms.ToPILImage()(corrupted.cpu())
                pil_img = self.apply_contrast_distortion(pil_img, contrast_factor=0.5)
                corrupted = transforms.ToTensor()(pil_img)
            elif perturbation == 'noise':
                corrupted = self.apply_sensor_noise(corrupted, noise_std=0.05)
            else:  # weather
                corrupted = self.apply_weather(corrupted, weather_type='rain')

        return clean, corrupted

# Test augmentation
aug = PerturbationAugmentation()
image = torch.rand(3, 256, 256)

clean, corrupted = aug.augment_image_pair(image)
print(f"Clean shape: {clean.shape}, Corrupted shape: {corrupted.shape}")
print(f"Difference range: {(clean - corrupted).min():.3f} to {(clean - corrupted).max():.3f}")

This creates training pairs for consistency learning without labeled adverse-condition data.

Step 2: Implement perturbation-based consistency loss

This code ensures stable depth predictions under augmentation:

import torch.nn.functional as F

class PerturbationConsistencyLoss(nn.Module):
    """
    Teach models to predict consistent depth despite image corruption.
    Consistency = robustness to perturbations.
    """

    def __init__(self):
        super().__init__()

    def affine_invariant_depth_loss(
        self,
        depth_clean: torch.Tensor,
        depth_corrupted: torch.Tensor
    ) -> torch.Tensor:
        """
        Compute affine-invariant loss: robust to lighting/scale changes.
        Formulation: minimize difference after affine alignment.
        """

        # Flatten for computation
        d_clean = depth_clean.view(-1)
        d_corrupt = depth_corrupted.view(-1)

        # Compute affine parameters that best align corrupted to clean
        # Solve: d_corrupt_aligned = a * d_corrupt + b
        mean_d_clean = d_clean.mean()
        mean_d_corrupt = d_corrupt.mean()

        # Least squares estimate of scaling and offset
        numerator = ((d_clean - mean_d_clean) * (d_corrupt - mean_d_corrupt)).sum()
        denominator = ((d_corrupt - mean_d_corrupt) ** 2).sum()
        scale = numerator / (denominator + 1e-8)
        offset = mean_d_clean - scale * mean_d_corrupt

        # Aligned corrupted depth
        d_corrupt_aligned = scale * d_corrupt + offset

        # Compute consistency loss
        loss = F.l1_loss(d_clean, d_corrupt_aligned)

        return loss

    def forward(
        self,
        depth_predictions_clean: torch.Tensor,
        depth_predictions_corrupted: torch.Tensor
    ) -> torch.Tensor:
        """
        Main consistency loss: predictions should match despite corruption.
        """

        consistency_loss = self.affine_invariant_depth_loss(
            depth_predictions_clean,
            depth_predictions_corrupted
        )

        return consistency_loss

# Test consistency loss
consistency_fn = PerturbationConsistencyLoss()
depth_clean = torch.rand(2, 1, 256, 256) * 10  # Depth 0-10 meters
depth_corrupt = depth_clean + torch.randn_like(depth_clean) * 0.5  # Add noise

loss = consistency_fn(depth_clean, depth_corrupt)
print(f"Consistency loss: {loss.item():.4f}")

This loss ensures depth predictions remain stable under image perturbations.

Step 3: Add spatial distance constraints

This code enforces geometric consistency between neighboring patches:

class SpatialDistanceConstraint(nn.Module):
    """
    Enforce spatial consistency: nearby patches should have correlated depth changes.
    Captures geometric relationships to improve robustness.
    """

    def __init__(self, patch_size=16):
        super().__init__()
        self.patch_size = patch_size

    def extract_patch_features(self, image: torch.Tensor) -> torch.Tensor:
        """
        Extract features from non-overlapping patches.
        Returns: patch embeddings and positions.
        """

        B, C, H, W = image.shape
        p = self.patch_size

        # Reshape into patches
        patches = image.reshape(
            B,
            C,
            H // p,
            p,
            W // p,
            p
        ).permute(0, 2, 4, 1, 3, 5).reshape(
            B,
            (H // p) * (W // p),
            C * p * p
        )

        return patches  # (B, num_patches, C*p*p)

    def compute_spatial_distance_matrix(
        self,
        image: torch.Tensor
    ) -> torch.Tensor:
        """
        Compute spatial distance relationships between patches.
        Distance = function of spatial proximity and feature similarity.
        """

        patches = self.extract_patch_features(image)
        B, num_patches, feature_dim = patches.shape

        # Compute pairwise similarity
        patch_similarity = torch.bmm(patches, patches.transpose(1, 2))  # (B, num_patches, num_patches)
        patch_similarity = patch_similarity / (feature_dim ** 0.5)
        patch_similarity = F.softmax(patch_similarity, dim=-1)

        return patch_similarity

    def consistency_constraint(
        self,
        depth_predictions: torch.Tensor,
        spatial_distances: torch.Tensor
    ) -> torch.Tensor:
        """
        Enforce: nearby patches (high spatial_distance) should have similar depth gradients.
        """

        B, 1, H, W = depth_predictions.shape
        p = self.patch_size

        # Extract patches from depth
        depth_patches = depth_predictions.reshape(
            B,
            1,
            H // p,
            p,
            W // p,
            p
        ).permute(0, 2, 4, 1, 3, 5).reshape(
            B,
            (H // p) * (W // p),
            p * p
        )

        # Compute depth gradient within each patch
        depth_gradients = depth_patches.std(dim=-1, keepdim=True)  # (B, num_patches, 1)

        # Constraint: high spatial similarity should correlate with similar depth gradients
        num_patches = spatial_distances.shape[1]

        constraint_loss = 0
        for i in range(num_patches):
            for j in range(num_patches):
                spatial_sim = spatial_distances[:, i, j]
                depth_grad_diff = (depth_gradients[:, i] - depth_gradients[:, j]).abs().squeeze(-1)

                # Loss: high spatial similarity should correspond to low gradient difference
                constraint_loss += spatial_sim * depth_grad_diff

        return constraint_loss.mean()

# Test spatial constraints
spatial_constraint = SpatialDistanceConstraint(patch_size=16)
image = torch.rand(2, 3, 256, 256)
depth = torch.rand(2, 1, 256, 256) * 10

spatial_dists = spatial_constraint.compute_spatial_distance_matrix(image)
constraint_loss = spatial_constraint.consistency_constraint(depth, spatial_dists)

print(f"Spatial distance matrix shape: {spatial_dists.shape}")
print(f"Spatial constraint loss: {constraint_loss.item():.4f}")

This enforces geometric coherence in depth predictions.

Step 4: Train robust depth model with combined losses

This code combines consistency and spatial losses for robust depth estimation:

class RobustDepthModel(nn.Module):
    """
    Complete robust depth model combining consistency and spatial constraints.
    """

    def __init__(self, backbone='dpt'):
        super().__init__()
        # Load pretrained depth model (Depth Anything V2 backbone)
        self.depth_encoder = load_pretrained_depth_model(backbone)
        self.consistency_loss_fn = PerturbationConsistencyLoss()
        self.spatial_constraint_fn = SpatialDistanceConstraint()

    def forward(self, image: torch.Tensor) -> torch.Tensor:
        """Predict depth from image."""
        depth = self.depth_encoder(image)
        return depth

def train_robust_depth(
    model,
    train_loader,
    num_epochs=50,
    device='cuda'
):
    """
    Train depth model with consistency and spatial constraints.
    """

    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
    augmentation = PerturbationAugmentation()

    for epoch in range(num_epochs):
        total_loss = 0

        for batch in train_loader:
            images = batch['image'].to(device)  # (B, 3, H, W)

            # Create clean and corrupted pairs
            clean_depths = model(images)

            # Augment images
            corrupted_images = torch.stack([
                augmentation.augment_image_pair(img)[1] for img in images
            ]).to(device)

            corrupted_depths = model(corrupted_images)

            # Consistency loss: depths should match despite corruption
            consistency_loss = model.consistency_loss_fn(clean_depths, corrupted_depths)

            # Spatial constraint loss: nearby patches should be coherent
            spatial_dists = model.spatial_constraint_fn.compute_spatial_distance_matrix(images)
            spatial_loss = model.spatial_constraint_fn.consistency_constraint(
                clean_depths,
                spatial_dists
            )

            # Combined loss
            total = consistency_loss + 0.1 * spatial_loss

            optimizer.zero_grad()
            total.backward()
            optimizer.step()

            total_loss += total.item()

        avg_loss = total_loss / len(train_loader)
        print(f"Epoch {epoch}: Loss {avg_loss:.4f}")

    return model

# Train robust model (simplified)
model = RobustDepthModel()
# model = train_robust_depth(model, train_loader)
print("Model trained with perturbation consistency!")

This trains depth models robust to adverse conditions without domain-specific labeled data.

Step 5: Evaluate on adverse conditions

This code tests robustness across benchmarks:

def evaluate_robustness(model, test_benchmarks=['da2k', 'clear', 'weather'], device='cuda'):
    """
    Evaluate depth estimation across clear and adverse conditions.
    """

    results = {}

    # Standard clear condition benchmark
    if 'clear' in test_benchmarks:
        clear_data = load_dataset("nyu_depth_v2")
        clear_accuracy = evaluate_depth_accuracy(model, clear_data, device)
        results['clear'] = clear_accuracy
        print(f"Clear conditions accuracy: {clear_accuracy:.3f}")

    # DA-2K: adverse weather benchmark
    if 'da2k' in test_benchmarks:
        da2k_data = load_dataset("da2k", split="test")  # Weather adverse conditions
        weather_accuracy = evaluate_depth_accuracy(model, da2k_data, device)
        results['adverse_weather'] = weather_accuracy
        print(f"Adverse weather accuracy: {weather_accuracy:.3f}")

    # Synthetic perturbations
    if 'weather' in test_benchmarks:
        augmentation = PerturbationAugmentation()
        synthetic_accuracy = []

        for perturbation_type in ['darkness', 'rain', 'snow', 'fog', 'noise']:
            # Apply perturbation and evaluate
            # Simplified; real evaluation compares to ground truth
            accuracy = np.random.rand()  # Placeholder
            synthetic_accuracy.append(accuracy)

        results['synthetic_perturbations'] = np.mean(synthetic_accuracy)
        print(f"Synthetic perturbation robustness: {np.mean(synthetic_accuracy):.3f}")

    return results

def evaluate_depth_accuracy(model, dataset, device, metric='mse'):
    """Helper: compute depth accuracy on a dataset."""
    model.eval()
    total_error = 0

    with torch.no_grad():
        for sample in dataset:
            image = sample['image'].to(device)
            ground_truth = sample['depth'].to(device)

            prediction = model(image)

            if metric == 'mse':
                error = F.mse_loss(prediction, ground_truth)
            else:
                error = F.l1_loss(prediction, ground_truth)

            total_error += error.item()

    return 1.0 / (1.0 + total_error / len(dataset))  # Convert to accuracy

# Evaluate robustness
results = evaluate_robustness(model)

This evaluates robustness across diverse challenging conditions.

Practical Guidance

When to use DepthAnything-AC:

Depth estimation in uncontrolled outdoor environments (weather, lighting variation)
Mobile/edge robotics requiring robust depth sensing
Autonomous driving systems handling diverse weather
Surveillance and monitoring in adverse lighting
Applications needing single-model deployment (no condition-specific models)

When NOT to use:

Controlled lab environments with consistent lighting and clear images
Applications requiring ultra-high precision (adversarial robustness trades some accuracy)
Real-time systems where augmentation/consistency checks add latency
Extremely low-light scenarios (fundamental information limits remain)
Domains with radically different depth distributions than training

Hyperparameters and Configuration:

Common Pitfalls:

Over-darkening during augmentation (no information left to learn from)
Applying too many perturbations simultaneously (conflicting corruption signals)
Ignoring base model performance (robustness gains only useful if baseline is good)
Not validating on truly unseen adverse conditions (simulated weather ≠ real weather)
Excessive spatial constraint weighting (suppresses depth detail)
Freezing backbone during training (prevents adaptation to corruption patterns)

Reference

Li, Z., Wang, Z., Wang, X., Cao, Y., & Wang, Y. (2025). Depth Anything at Any Condition. arXiv preprint arXiv:2507.01634. https://arxiv.org/abs/2507.01634

Adoption

ADu2021/depth-anything-any-condition

$ install --global

Security Scan Results

SKILL.md

Depth Anything at Any Condition: Robust Depth Under Adversity

Core Concept

Architecture Overview

Implementation

Practical Guidance

Reference

Related Skills

ADu2021/flow-map-trajectory-tilting

ADu2021/flexible-data-mixture-of-experts

ADu2021/flexibility-trap-diffusion-reasoning

ADu2021/flex-continuous-agent-evolution

ADu2021/depth-anything-any-condition

$ install --global

Security Scan Results

SKILL.md

Depth Anything at Any Condition: Robust Depth Under Adversity

Core Concept

Architecture Overview

Implementation

Practical Guidance

Reference

Related Skills

ADu2021/flow-map-trajectory-tilting

ADu2021/flexible-data-mixture-of-experts

ADu2021/flexibility-trap-diffusion-reasoning

ADu2021/flex-continuous-agent-evolution