skills/nlp/keystroke-essay-reconstruction/SKILL.md
Replay keystroke activity logs (Input/Replace/Paste/Remove/Move) against a string buffer to reconstruct the evolving essay text
npx skillsauth add wenmin-wu/ds-skills nlp-keystroke-essay-reconstructionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a dataset only gives you keystroke events — activity, cursor_position, text_change — and not the final text, you can deterministically rebuild the essay (or at least its anonymized structure) by replaying each event against a running string buffer. This unlocks a whole second feature family: sentence / paragraph / word-length statistics that are impossible to compute from event logs alone. Used in the Kaggle "Linking Writing Processes to Writing Quality" competition to add ~20-40 text-shape features on top of event-based ones.
def reconstruct_essay(df):
df = df[df['activity'] != 'Nonproduction']
out = {}
for uid, g in df.groupby('id'):
essay = ""
for act, cur, txt in g[['activity','cursor_position','text_change']].values:
if act == 'Replace':
old, new = txt.split(' => ')
essay = essay[:cur-len(new)] + new + essay[cur-len(new)+len(old):]
elif act == 'Paste':
essay = essay[:cur-len(txt)] + txt + essay[cur-len(txt):]
elif act == 'Remove/Cut':
essay = essay[:cur] + essay[cur+len(txt):]
elif 'M' in act: # Move From [a,b] To [c,d]
pass # splice slices per parsed offsets
else: # Input
essay = essay[:cur-len(txt)] + txt + essay[cur-len(txt):]
out[uid] = essay
return out
Nonproduction rows (mouse clicks, focus changes) — they don't modify textcursor_position using text_change lengthq's and whitespace, same shape as the real essay=>. Parse them to compute the delta correctly, otherwise offsets drift.From [a,b] To [c,d] and splice three slices. Skip it if rare — most sessions have <1% Move events.text_change values misses deletions and the final layout; only replay recovers structure.data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF