skills/tabular/per-type-model-training/SKILL.md
Trains separate models for each discrete category (e.g., molecule type, product class) to capture type-specific patterns.
npx skillsauth add wenmin-wu/ds-skills tabular-per-type-model-trainingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a dataset contains distinct subgroups with different distributions (e.g., molecule coupling types, product categories), train a separate model per type. Each model sees only its type's data, learning type-specific feature importance and hyperparameters. This often outperforms a single global model with type as a feature.
import lightgbm as lgb
from sklearn.model_selection import KFold
def train_per_type(df_train, df_test, target_col, type_col, features, params):
"""Train separate models per category type."""
predictions = df_test[[type_col]].copy()
predictions['pred'] = 0.0
scores = {}
for t in df_train[type_col].unique():
mask_train = df_train[type_col] == t
mask_test = df_test[type_col] == t
X_t = df_train.loc[mask_train, features]
y_t = df_train.loc[mask_train, target_col]
X_test_t = df_test.loc[mask_test, features]
folds = KFold(n_splits=5, shuffle=True, random_state=42)
oof = np.zeros(len(X_t))
pred = np.zeros(len(X_test_t))
for train_idx, val_idx in folds.split(X_t):
model = lgb.LGBMRegressor(**params)
model.fit(X_t.iloc[train_idx], y_t.iloc[train_idx],
eval_set=[(X_t.iloc[val_idx], y_t.iloc[val_idx])],
callbacks=[lgb.early_stopping(200)])
oof[val_idx] = model.predict(X_t.iloc[val_idx])
pred += model.predict(X_test_t) / folds.n_splits
predictions.loc[mask_test, 'pred'] = pred
scores[t] = np.mean(np.abs(y_t - oof))
return predictions, scores
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF