skills/tabular/authorized-flag-table-split/SKILL.md
Split a transaction table by a binary status flag (authorized vs. declined, paid vs. refunded) into two parallel sub-tables, then build the same aggregate feature pipeline on each — the declined-transaction features are usually as predictive as the authorized ones because they encode risk and friction the authorized stream alone hides
npx skillsauth add wenmin-wu/ds-skills tabular-authorized-flag-table-splitInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A common mistake on transaction panels is to filter authorized_flag == 'Y' and discard the rest. The declined rows are not noise — they encode declined-rate, declined-amount, declined-merchant-diversity, and other friction signals that strongly correlate with churn, fraud, and loyalty score. Top Elo Merchant solutions all built three parallel feature blocks: (1) full table, (2) authorized-only sub-table, (3) non-authorized sub-table — each through the same groupby(card_id).agg(...) pipeline. The model learns from the contrast: auth_amt_mean / nonauth_amt_mean, auth_count / total_count. Generalize to any binary flag — paid/refunded, success/failure, mobile/web.
auth = tx[tx.authorized_flag == 'Y']
nonauth = tx[tx.authorized_flag == 'N']
def agg_block(df, prefix):
g = df.groupby('card_id').agg(
amt_sum=('purchase_amount', 'sum'),
amt_mean=('purchase_amount', 'mean'),
cnt=('purchase_amount', 'size'),
nuniq_merch=('merchant_id', 'nunique'),
).reset_index()
g.columns = ['card_id'] + [f'{prefix}_{c}' for c in g.columns[1:]]
return g
feat_all = agg_block(tx, 'all')
feat_auth = agg_block(auth, 'auth')
feat_non = agg_block(nonauth, 'nonauth')
cards = feat_all.merge(feat_auth, on='card_id', how='left')\
.merge(feat_non, on='card_id', how='left')
cards['auth_rate'] = cards['auth_cnt'] / cards['all_cnt']
authorized_flag, paid, is_success)auth_*, nonauth_*) so feature importance is self-explanatoryauth_rate, auth_amt_share)auth_rate (declined fraction) is usually a top-10 feature on its own.tx, not pd.concat([auth, nonauth]) — they are identical but the latter is slower.data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF