skills/tabular/type-weighted-covisitation-matrix/SKILL.md
Build item co-visitation matrix from session pairs within a time window, weighting by interaction type (click/cart/order) via GPU self-join
npx skillsauth add wenmin-wu/ds-skills tabular-type-weighted-covisitation-matrixInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For session-based recommendation, build a co-visitation matrix by self-joining session events within a time window. Weight each co-occurrence by interaction type (e.g. clicks=1, carts=6, orders=3) so high-intent signals dominate. Use RAPIDS cuDF for GPU-accelerated computation on large datasets. Extract top-K co-visited items per anchor for candidate generation.
import cudf
def build_covisitation_matrix(df, type_weight={0:1, 1:6, 2:3},
window_hours=24, top_k=20):
"""Build type-weighted co-visitation from session events.
Args:
df: DataFrame with columns [session, aid, ts, type]
type_weight: {event_type: weight} mapping
window_hours: time window for co-occurrence (hours)
top_k: top co-visited items to keep per anchor
"""
window_sec = window_hours * 3600
df = df.merge(df, on='session')
df = df.loc[
((df.ts_x - df.ts_y).abs() < window_sec) &
(df.aid_x != df.aid_y)
]
df = df[['session','aid_x','aid_y','type_y']].drop_duplicates(
['session','aid_x','aid_y'])
df['wgt'] = df.type_y.map(type_weight).astype('float32')
# Aggregate and keep top-K per anchor
pairs = df.groupby(['aid_x','aid_y']).wgt.sum().reset_index()
pairs = pairs.sort_values(['aid_x','wgt'], ascending=[True, False])
pairs['n'] = pairs.groupby('aid_x').aid_y.cumcount()
return pairs.loc[pairs.n < top_k].drop('n', axis=1)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF