skills/tabular/anonymized-target-reverse-engineering/SKILL.md
Probe an anonymized regression target by testing whether simple invertible transforms (2**y, exp(y), log(y), affine rescale) produce a distribution with recognizable structure — round numbers, integer histograms, or a familiar finance/retail range — and use the recovered semantics to motivate features and loss choices the host's bland description would never suggest
npx skillsauth add wenmin-wu/ds-skills tabular-anonymized-target-reverse-engineeringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a competition host hides the target ("loyalty score, normalized") your first move is not to build a model — it's to figure out what the number actually means. Histogram it; try 2**y, np.exp(y), y * std + mean for various candidate scales. If a transform makes the distribution snap onto integer values, dollar amounts, or a known KPI range, you have just unlocked feature engineering the host did not intend to share. On Elo, applying 2**target revealed an integer-like structure consistent with a churn-month count; the discovery motivated month-recency features that pushed solutions up the LB. This is dual-use: also use it to detect leakage (if the transform recovers a column you already have).
import numpy as np, matplotlib.pyplot as plt
y = train.target.values
# 1) raw histogram
plt.hist(y, bins=200); plt.title('raw'); plt.show()
# 2) try common inverse transforms
for name, t in [('2**y', 2**y), ('exp(y)', np.exp(y)),
('y**2', y**2), ('1/(1-y)', 1/(1-y))]:
vals = t[np.isfinite(t)]
plt.hist(vals, bins=200); plt.title(name); plt.show()
print(name, 'min=', vals.min(), 'max=', vals.max(),
'frac_int=', np.mean(np.isclose(vals, np.round(vals), atol=1e-3)))
exp, log, 2**, **2, affine rescales using train.describe() percentiles as anchors2**y == months_since_last_purchase for 5 random cards, you've recovered the targetdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF