skills/nlp/transformer-lgbm-stacking/SKILL.md
Two-stage stacking: transformer predictions plus hand-crafted features feed into a LightGBM meta-learner.
npx skillsauth add wenmin-wu/ds-skills nlp-transformer-lgbm-stackingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Fine-tune a transformer (e.g., DeBERTa) for initial predictions, then use those predictions alongside hand-crafted features (text statistics, overlap metrics) as inputs to a LightGBM meta-learner. This captures both deep semantic understanding and surface-level patterns that transformers miss.
import lightgbm as lgb
# Stage 1: Get transformer OOF predictions per fold
for fold in range(n_folds):
model = train_transformer(train[train.fold != fold])
train.loc[train.fold == fold, "pred"] = predict(model, train[train.fold == fold])
# Stage 2: Stack with hand-crafted features
features = ["pred", "word_count", "overlap_ratio", "ner_overlap", "misspelling_count"]
for fold in range(n_folds):
dtrain = lgb.Dataset(train[train.fold != fold][features], label=y_train)
dval = lgb.Dataset(train[train.fold == fold][features], label=y_val)
model = lgb.train(params, dtrain, valid_sets=[dval],
callbacks=[lgb.early_stopping(30)])
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF