skills/tabular/confidence-weighted-rate-encoding/SKILL.md
Encodes categorical groups by their target rate scaled by a log-confidence factor, smoothing unreliable rates from low-frequency groups toward zero.
npx skillsauth add wenmin-wu/ds-skills tabular-confidence-weighted-rate-encodingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Target/mean encoding maps each categorical value to its average target rate, but low-frequency categories produce noisy estimates. Confidence-weighted rate encoding multiplies the raw rate by min(1, log(count) / log(threshold)), smoothing rare groups toward zero while preserving high-frequency estimates. This is a lightweight alternative to Bayesian target encoding that avoids specifying a prior distribution — the log-confidence factor acts as an implicit regularizer.
import numpy as np
import pandas as pd
LOG_GROUP = np.log(100000) # confidence saturates at 100k observations
def confidence_weighted_rate(x):
"""Rate × min(1, log(count)/log(threshold))."""
rate = x.sum() / float(x.count())
confidence = min(1.0, np.log(x.count()) / LOG_GROUP)
return rate * confidence
# Single group
df['ip_conf_rate'] = df.groupby('ip')['target'] \
.transform(confidence_weighted_rate)
# Multi-key groups
GROUP_SPECS = [
['ip'],
['ip', 'app'],
['ip', 'device', 'os'],
['app', 'channel'],
]
for cols in GROUP_SPECS:
feat_name = '_'.join(cols) + '_confRate'
df[feat_name] = df.groupby(cols)['target'] \
.transform(confidence_weighted_rate)
min(1, log(count) / log(threshold))data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF