skills/tabular/catboost-multirmse/SKILL.md
CatBoostRegressor with MultiRMSE loss for native multi-output regression, predicting all targets in a single model without per-target loops.
npx skillsauth add wenmin-wu/ds-skills tabular-catboost-multirmseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Most gradient boosting libraries (XGBoost, LightGBM) require fitting one model per output column for multi-output regression — slow when there are 100+ targets. CatBoost's MultiRMSE loss natively handles multiple outputs in a single model, sharing tree structure across targets. This is 5-50x faster than per-target loops and can capture cross-target correlations. Combined with SVD target compression, it enables efficient prediction of thousands of outputs.
from catboost import CatBoostRegressor
params = {
'learning_rate': 0.1,
'depth': 7,
'l2_leaf_reg': 4,
'loss_function': 'MultiRMSE',
'eval_metric': 'MultiRMSE',
'iterations': 200,
'boosting_type': 'Plain',
'bootstrap_type': 'Bayesian',
'allow_const_label': True,
'random_state': 42,
}
model = CatBoostRegressor(**params)
model.fit(X_train, Y_train, eval_set=(X_val, Y_val), verbose=50)
preds = model.predict(X_test) # shape: (n_test, n_targets)
loss_function='MultiRMSE'task_type='GPU' for large datasets; MultiRMSE supports GPU trainingdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF