skills/cv/numeric-categorical-auto-detection/SKILL.md
Auto-detect whether a generated data series is numeric or categorical by measuring the fraction of digit characters in the concatenated values
npx skillsauth add wenmin-wu/ds-skills cv-numeric-categorical-auto-detectionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When a generative model produces data series from charts, the output is always strings — but downstream processing differs for numeric values (float parsing, RMSE scoring) vs categorical values (string matching, Levenshtein scoring). Auto-detect the data type by computing the digit-character fraction of the concatenated series. If ≥50% of characters are digits, treat as numeric and parse; otherwise treat as categorical strings.
import re
def detect_series_type(values):
all_chars = "".join(values)
if len(all_chars) == 0:
return "categorical"
digit_chars = len(re.sub(r"[^\d]", "", all_chars))
frac_numeric = digit_chars / len(all_chars)
return "numeric" if frac_numeric >= 0.5 else "categorical"
def process_series(values):
if detect_series_type(values) == "numeric":
return clean_numeric(values) # parse to float/int
else:
return [s.strip() for s in values] # keep as strings
x_series = ["2020", "2021", "2022"]
y_series = ["Apple", "Banana", "Cherry"]
detect_series_type(x_series) # "numeric"
detect_series_type(y_series) # "categorical"
re.sub(r"[^\d]", "", text)data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF