skills/tabular/collinear-feature-removal/SKILL.md
Removes redundant features by iterating pairwise Pearson correlations and dropping one member of each pair exceeding a threshold.
npx skillsauth add wenmin-wu/ds-skills tabular-collinear-feature-removalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Highly correlated features add redundancy without improving model performance and can destabilize linear models and increase overfitting in tree models on wide datasets. This technique computes the full pairwise correlation matrix, identifies pairs above a threshold (typically 0.8–0.95), and drops one member of each pair. Unlike VIF or PCA, it's interpretable — you know exactly which features were removed and why.
import pandas as pd
import numpy as np
def remove_collinear(df, threshold=0.8):
"""Remove one feature from each pair with |correlation| > threshold."""
corr_matrix = df.corr().abs()
upper = corr_matrix.where(
np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)
)
to_drop = [col for col in upper.columns if any(upper[col] > threshold)]
return df.drop(columns=to_drop), to_drop
train_reduced, dropped = remove_collinear(train[numeric_cols], threshold=0.8)
test_reduced = test.drop(columns=dropped)
print(f"Removed {len(dropped)} collinear features")
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF