skills/nlp/spell-correction-preprocessing/SKILL.md
Applies domain-aware spelling correction before transformer input to separate spelling errors from content quality.
npx skillsauth add wenmin-wu/ds-skills nlp-spell-correction-preprocessingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When evaluating text quality (e.g., student essays), misspellings confuse transformer models that expect well-formed tokens. Autocorrect text before encoding, but augment the spellchecker dictionary with domain vocabulary (e.g., prompt-specific terms) to avoid "correcting" valid domain words. Track misspelling count as a separate feature.
from autocorrect import Speller
from spellchecker import SpellChecker
class SpellCorrector:
def __init__(self):
self.speller = Speller(lang="en")
self.spellchecker = SpellChecker()
def add_domain_vocab(self, tokens):
"""Add domain terms so they aren't autocorrected."""
self.spellchecker.word_frequency.load_words(tokens)
self.speller.nlp_data.update({t: 1000 for t in tokens})
def correct(self, text):
return self.speller(text)
# Usage
corrector = SpellCorrector()
corrector.add_domain_vocab(prompt_tokens)
df["fixed_text"] = df["text"].apply(corrector.correct)
df["misspelling_count"] = df.apply(
lambda r: len(set(r["text"].split()) - set(r["fixed_text"].split())), axis=1
)
autocorrect for correction, spellchecker for detectiondata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF