skills/timeseries/keystroke-pause-bucket-features/SKILL.md
Bucket inter-keystroke latencies into pause-duration ranges (0.5-1s, 1-1.5s, 1.5-2s, 2-3s, >3s) and count per session as hesitation features
npx skillsauth add wenmin-wu/ds-skills timeseries-keystroke-pause-bucket-featuresInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
In keystroke / writing-process data, raw event-latency distributions are noisy but their binned shapes are highly predictive of quality. Pauses under 0.5s are within-word; 0.5-2s are within-sentence hesitations; >3s are plan/read pauses. Counting events in each bucket per session gives five features that cleanly separate "fast drafter" from "slow reviser" from "struggling" writers, and they are more stable than min/mean/max because outliers fall into the tail bucket instead of distorting the moments.
# pandas version
logs['up_time_lagged'] = logs.groupby('id')['up_time'].shift(1).fillna(logs['down_time'])
logs['time_diff'] = (logs['down_time'] - logs['up_time_lagged']).abs() / 1000 # seconds
g = logs.groupby('id')['time_diff']
feats = pd.DataFrame({
'initial_pause': logs.groupby('id')['down_time'].first() / 1000,
'pauses_half_sec': g.apply(lambda x: ((x > 0.5) & (x < 1 )).sum()),
'pauses_1_sec': g.apply(lambda x: ((x > 1 ) & (x < 1.5)).sum()),
'pauses_1_5_sec': g.apply(lambda x: ((x > 1.5) & (x < 2 )).sum()),
'pauses_2_sec': g.apply(lambda x: ((x > 2 ) & (x < 3 )).sum()),
'pauses_3_sec': g.apply(lambda x: (x > 3).sum()),
})
time_diff = down_time - lag(up_time) per session — this is the "idle gap" between keystrokesinitial_pause (time before first key) as a separate feature — it captures planning time>3s must be unbounded — capping it loses the "stuck writer" signal.data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF