skills/nlp/transformer-embedding-svr/SKILL.md
Extracts frozen embeddings from multiple pretrained transformers and trains SVR on the concatenated features.
npx skillsauth add wenmin-wu/ds-skills nlp-transformer-embedding-svrInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Instead of fine-tuning transformers end-to-end, extract mean-pooled embeddings from multiple frozen pretrained models, concatenate them into a high-dimensional feature vector, and train a lightweight SVR on top. This avoids GPU-intensive fine-tuning while achieving competitive scores.
import torch
from transformers import AutoTokenizer, AutoModel
from cuml.svm import SVR # GPU-accelerated
def extract_embeddings(texts, model_name, batch_size=32):
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name).eval().cuda()
embeddings = []
for i in range(0, len(texts), batch_size):
batch = tokenizer(texts[i:i+batch_size], padding=True,
truncation=True, max_length=512, return_tensors="pt").to("cuda")
with torch.no_grad():
out = model(**batch).last_hidden_state
mask = batch["attention_mask"].unsqueeze(-1).float()
pooled = (out * mask).sum(1) / mask.sum(1).clamp(min=1e-9)
embeddings.append(pooled.cpu().numpy())
return np.vstack(embeddings)
# Concatenate embeddings from multiple models
feats = np.hstack([extract_embeddings(texts, m) for m in model_names])
svr = SVR(C=1)
svr.fit(feats_train, y_train)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF