skills/tabular/xgb-gpu-batch-iterator/SKILL.md
Use XGBoost DeviceQuantileDMatrix with a custom batch iterator to train on large datasets without exhausting GPU memory
npx skillsauth add wenmin-wu/ds-skills tabular-xgb-gpu-batch-iteratorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When training XGBoost on GPU with data too large to fit in GPU memory, use DeviceQuantileDMatrix with a custom DataIter that streams batches. Each batch is loaded, quantized, then freed — only the quantile sketch stays in memory. Cuts GPU memory usage proportionally to batch count with minimal speed penalty.
import xgboost as xgb
import numpy as np
class BatchIterator(xgb.core.DataIter):
def __init__(self, df, features, target, batch_size=256*1024):
self.df = df
self.features = features
self.target = target
self.batch_size = batch_size
self.it = 0
self.n_batches = int(np.ceil(len(df) / batch_size))
super().__init__()
def reset(self):
self.it = 0
def next(self, input_data):
if self.it == self.n_batches:
return 0
a = self.it * self.batch_size
b = min(a + self.batch_size, len(self.df))
batch = self.df.iloc[a:b]
input_data(data=batch[self.features], label=batch[self.target])
self.it += 1
return 1
# Usage
it = BatchIterator(train_df, features, 'target', batch_size=256*1024)
dtrain = xgb.DeviceQuantileDMatrix(it, max_bin=256)
model = xgb.train(params, dtrain, num_boost_round=2000)
next() to avoid CPU-GPU transferdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF