skills/tabular/pairwise-te-logit-stacking/SKILL.md
Generates all C(n,2) pairwise feature combinations, target-encodes each pair with cuML TargetEncoder, then applies logit polynomial expansion (z, z^2, z^3) for stacking with cuML LogisticRegression.
npx skillsauth add wenmin-wu/ds-skills tabular-pairwise-te-logit-stackingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For n categorical/discretized features, generate all n*(n-1)/2 pairwise combinations. Target-encode each pair to get a probability estimate, then transform through logit(p), logit(p)^2, logit(p)^3 to create polynomial features in logit space. These features feed into a cuML LogisticRegression, producing predictions highly diverse from tree-based models -- ideal for ensembling.
from cuml.preprocessing import TargetEncoder
from cuml.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from itertools import combinations
import numpy as np
import cudf
features = list(combinations(feature_cols, 2)) # C(n, 2) pairs
te_train = cudf.DataFrame()
te_test = cudf.DataFrame()
for f1, f2 in features:
col_name = f'{f1}__{f2}'
train[col_name] = train[f1].astype(str) + '_' + train[f2].astype(str)
test[col_name] = test[f1].astype(str) + '_' + test[f2].astype(str)
te = TargetEncoder(n_folds=5, smooth=10)
te_train[col_name] = te.fit_transform(train[col_name], train[target])
te_test[col_name] = te.transform(test[col_name])
# Logit polynomial expansion
def logit_expand(p, eps=1e-6):
p = np.clip(p, eps, 1 - eps)
z = np.log(p / (1 - p))
return np.column_stack([z, z**2, z**3])
X_logit_train = logit_expand(te_train.to_numpy())
X_logit_test = logit_expand(te_test.to_numpy())
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_logit_train)
X_test_scaled = scaler.transform(X_logit_test)
model = LogisticRegression(C=0.5, max_iter=1000)
model.fit(X_scaled, y_train)
C(n, 2) pairwise string combinationscuml.preprocessing.TargetEncoder(n_folds=5)[eps, 1-eps], compute z = logit(p)[z, z^2, z^3] for each pair -- total features = 3 * C(n, 2)cuml.linear_model.LogisticRegression(C=0.5)data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF