skills/nlp/head-tail-truncation/SKILL.md
Preserves both the start and end of long text sequences when truncating for transformer input limits.
npx skillsauth add wenmin-wu/ds-skills nlp-head-tail-truncationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Standard BERT truncation drops tokens from the end, losing potentially important closing content. Head-tail truncation keeps the first N and last M tokens, discarding the middle. This preserves both the introduction and conclusion of long documents, which often carry the most signal.
def head_tail_truncate(token_ids, max_length, head_ratio=0.5):
"""Keep head and tail tokens, drop the middle.
Args:
token_ids: list of token IDs (without special tokens)
max_length: maximum allowed tokens (excluding [CLS]/[SEP])
head_ratio: fraction of budget allocated to head
Returns:
truncated token ID list
"""
if len(token_ids) <= max_length:
return token_ids
head_len = int(max_length * head_ratio)
tail_len = max_length - head_len
return token_ids[:head_len] + token_ids[-tail_len:]
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF