skills/tabular/svd-target-reconstruction/SKILL.md
Compresses high-dimensional targets with TruncatedSVD, trains on the reduced space, then reconstructs full predictions via the components matrix.
npx skillsauth add wenmin-wu/ds-skills tabular-svd-target-reconstructionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When predicting thousands of outputs (e.g., 23,418 gene expressions), training directly is slow and noisy. TruncatedSVD compresses targets to 64-512 components, capturing the dominant variance. The model trains on these compact representations, then at inference, predictions are projected back to full dimensionality by multiplying with the SVD components matrix. This reduces training time by 10-100x while preserving 95%+ of target variance, and acts as implicit regularization by discarding noise in low-variance components.
import numpy as np
from sklearn.decomposition import TruncatedSVD
import scipy.sparse
# Y_train shape: (n_samples, 23418) — can be sparse
svd = TruncatedSVD(n_components=128, random_state=42)
Y_reduced = svd.fit_transform(Y_train) # (n_samples, 128)
print(f"Variance retained: {svd.explained_variance_ratio_.sum():.3f}")
# Train model on reduced targets
model.fit(X_train, Y_reduced)
# Reconstruct full predictions at inference
preds_reduced = model.predict(X_test) # (n_test, 128)
preds_full = preds_reduced @ svd.components_ # (n_test, 23418)
svd.components_ to reconstructexplained_variance_ratio_.sum() — aim for >0.90data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF