skills/nlp/mean-pooling-with-mask/SKILL.md
Computes attention-mask-weighted mean of token embeddings, excluding padding tokens from the average.
npx skillsauth add wenmin-wu/ds-skills nlp-mean-pooling-with-maskInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Instead of using the CLS token or naive mean, mask-aware mean pooling averages only the real token embeddings by weighting with the attention mask. This prevents padding tokens from diluting the representation, especially important for variable-length inputs.
import torch
import torch.nn as nn
class MeanPooling(nn.Module):
def forward(self, last_hidden_state, attention_mask):
mask = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
sum_embeddings = torch.sum(last_hidden_state * mask, dim=1)
sum_mask = torch.clamp(mask.sum(dim=1), min=1e-9)
return sum_embeddings / sum_mask
last_hidden_state from transformer (shape: B x S x H)data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF