skills/nlp/spatial-dropout-embedding/SKILL.md
Drop entire embedding channels consistently across all timesteps — preserves temporal structure better than element-wise dropout
npx skillsauth add wenmin-wu/ds-skills nlp-spatial-dropout-embeddingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Standard dropout on word embeddings randomly zeros individual elements, which can fragment the signal within a single embedding dimension. Spatial dropout drops entire channels (feature dimensions) consistently across all timesteps, forcing the model to not rely on any single feature. More effective for sequential models (LSTM, CNN) where temporal coherence matters.
import torch
import torch.nn as nn
class SpatialDropout(nn.Dropout2d):
"""Drop entire embedding dimensions across all timesteps."""
def forward(self, x):
# x: (batch, seq_len, embed_dim)
x = x.unsqueeze(2) # (B, T, 1, D)
x = x.permute(0, 3, 2, 1) # (B, D, 1, T)
x = super().forward(x) # drops entire D channels
x = x.permute(0, 3, 2, 1) # (B, T, 1, D)
x = x.squeeze(2) # (B, T, D)
return x
# Usage in model
class TextModel(nn.Module):
def __init__(self, embedding_matrix, p_drop=0.2):
super().__init__()
self.embedding = nn.Embedding.from_pretrained(
torch.tensor(embedding_matrix, dtype=torch.float32))
self.spatial_dropout = SpatialDropout(p_drop)
self.lstm = nn.LSTM(300, 128, bidirectional=True, batch_first=True)
def forward(self, x):
x = self.embedding(x)
x = self.spatial_dropout(x)
x, _ = self.lstm(x)
return x
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF