skills/nlp/weighted-layer-pooling/SKILL.md
Learns a weighted combination of CLS embeddings across all transformer layers instead of using only the last layer.
npx skillsauth add wenmin-wu/ds-skills nlp-weighted-layer-poolingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Different transformer layers capture different linguistic features — lower layers for syntax, higher for semantics. Instead of using only the final layer, weighted layer pooling learns a soft weight per layer and computes a weighted mean. This typically improves regression tasks where multiple levels of language understanding matter.
import torch
import torch.nn as nn
class WeightedLayerPooling(nn.Module):
def __init__(self, num_hidden_layers, layer_start=4):
super().__init__()
self.layer_start = layer_start
self.num_layers = num_hidden_layers - layer_start + 1
self.layer_weights = nn.Parameter(torch.ones(self.num_layers))
def forward(self, all_hidden_states):
layers = torch.stack(all_hidden_states[self.layer_start:]) # (L, B, S, H)
weights = self.layer_weights.unsqueeze(-1).unsqueeze(-1).unsqueeze(-1)
weights = weights.expand(layers.size())
weighted = (weights * layers).sum(dim=0) / self.layer_weights.sum()
return weighted[:, 0] # CLS token
output_hidden_states=True in model configdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF