skills/tabular/optuna-lgbm-tuning/SKILL.md
Uses Optuna with TPE sampler for Bayesian hyperparameter optimization of LightGBM, searching key params like num_leaves, depth, and learning rate.
npx skillsauth add wenmin-wu/ds-skills tabular-optuna-lgbm-tuningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Optuna's Tree-structured Parzen Estimator (TPE) efficiently searches LightGBM hyperparameter space by building a probabilistic model of which regions produce good scores. Typically finds better parameters than grid/random search in fewer trials. Define an objective function, let Optuna propose params, evaluate with CV.
import optuna
import lightgbm as lgb
from sklearn.model_selection import cross_val_score
def objective(trial):
params = {
'n_estimators': trial.suggest_int('n_estimators', 100, 2000),
'num_leaves': trial.suggest_int('num_leaves', 16, 256),
'max_depth': trial.suggest_int('max_depth', 3, 12),
'learning_rate': trial.suggest_float('learning_rate', 1e-3, 0.3, log=True),
'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
'subsample': trial.suggest_float('subsample', 0.5, 1.0),
'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1.0),
'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True),
'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True),
}
model = lgb.LGBMClassifier(**params, random_state=42, n_jobs=-1)
score = cross_val_score(model, X_train, y_train, cv=5, scoring='roc_auc').mean()
return score
study = optuna.create_study(direction='maximize', sampler=optuna.samplers.TPESampler(seed=42))
study.optimize(objective, n_trials=100, show_progress_bar=True)
best_model = lgb.LGBMClassifier(**study.best_params)
best_model.fit(X_train, y_train)
trial.suggest_* for each hyperparameteroptuna.visualization to inspect param importanceoptuna.integration.LightGBMPruningCallback to early-stop bad trialsdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF