skills/tabular/lgbm-dart-boosting/SKILL.md
Use LightGBM DART boosting (dropout on trees) with aggressive feature and bagging fractions to reduce overfitting on high-dimensional tabular data
npx skillsauth add wenmin-wu/ds-skills tabular-lgbm-dart-boostingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
DART (Dropouts meet Multiple Additive Regression Trees) randomly drops completed trees during training, reducing over-specialization of later trees. Combined with aggressive feature subsampling (20%) and bagging fraction (50%), it produces strong generalization on high-dimensional tabular data where standard GBDT overfits. Trades training speed for better holdout scores.
import lightgbm as lgb
params = {
'objective': 'binary',
'metric': 'binary_logloss',
'boosting': 'dart',
'num_leaves': 100,
'learning_rate': 0.01,
'feature_fraction': 0.20,
'bagging_freq': 10,
'bagging_fraction': 0.50,
'lambda_l2': 2,
'min_data_in_leaf': 40,
'n_jobs': -1,
}
model = lgb.train(
params=params,
train_set=lgb_train,
num_boost_round=10500,
valid_sets=[lgb_train, lgb_valid],
callbacks=[lgb.early_stopping(1500), lgb.log_evaluation(100)],
)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF