skills/nlp/hierarchical-label-encoding/SKILL.md
Concatenate multi-level categorical fields into a compound label (Category:Subcategory) for flat multiclass classification
npx skillsauth add wenmin-wu/ds-skills nlp-hierarchical-label-encodingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When labels have a hierarchical structure (Category → Subcategory → Specific), concatenate levels into a single compound string and LabelEncode for flat multiclass classification. Simpler than hierarchical classifiers, and models can learn the structure implicitly. Decode back to original levels at inference via string splitting.
from sklearn.preprocessing import LabelEncoder
import numpy as np
# Encode: create compound labels
df['compound_label'] = df['Category'] + ':' + df['Subcategory']
le = LabelEncoder()
df['label'] = le.fit_transform(df['compound_label'])
n_classes = len(le.classes_)
# Train model with n_classes output neurons...
# Decode: convert top-K predictions back to original labels
def decode_top_k(probs, le, k=3):
"""Decode top-K softmax predictions to original compound labels."""
top_k = np.argsort(-probs, axis=1)[:, :k]
decoded = le.inverse_transform(top_k.flatten())
return decoded.reshape(top_k.shape)
top3_labels = decode_top_k(probs, le, k=3)
# Each row: ["Category_A:Sub_1", "Category_B:Sub_3", "Category_A:Sub_2"]
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF