skills/llm/last-token-pooling-embedding/SKILL.md
Extracts dense sentence embeddings from decoder-only LLMs by pooling the last non-padding token's hidden state.
npx skillsauth add wenmin-wu/ds-skills llm-last-token-pooling-embeddingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Decoder-only LLMs (Qwen, LLaMA, Mistral) can produce high-quality dense embeddings for retrieval. Unlike encoder models that use [CLS] or mean pooling, causal LMs encode all context into the final token. Extract the last non-padding token's hidden state as the sentence embedding, then L2-normalize for cosine similarity search.
import torch
import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
def last_token_pool(hidden_states, attention_mask):
left_padding = attention_mask[:, -1].sum() == attention_mask.shape[0]
if left_padding:
return hidden_states[:, -1]
seq_lengths = attention_mask.sum(dim=1) - 1
return hidden_states[torch.arange(hidden_states.size(0), device=hidden_states.device), seq_lengths]
model = AutoModel.from_pretrained("Qwen/Qwen2.5-14B", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B")
inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embeddings = last_token_pool(outputs.last_hidden_state, inputs["attention_mask"])
embeddings = F.normalize(embeddings, p=2, dim=1)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF