skills/nlp/chrf-bleu-geometric-mean-metric/SKILL.md
Geometric mean of chrF and BLEU as a balanced composite translation evaluation metric
npx skillsauth add wenmin-wu/ds-skills nlp-chrf-bleu-geometric-mean-metricInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
chrF captures character-level quality (good for morphology), BLEU captures word-level n-gram precision. Their geometric mean balances both aspects and is more robust than either alone as a model selection metric during training.
import sacrebleu
import numpy as np
def chrf_bleu_geomean(predictions, references):
chrf = sacrebleu.corpus_chrf(predictions, [references], word_order=2)
bleu = sacrebleu.corpus_bleu(predictions, [references])
c, b = chrf.score, bleu.score
geo = (c * b) ** 0.5 if c > 0 and b > 0 else 0.0
return {"chrf": c, "bleu": b, "geo_mean": geo}
# HuggingFace Trainer integration
def compute_metrics(eval_preds):
preds, labels = eval_preds
preds = np.where(preds == -100, tokenizer.pad_token_id, preds)
labels = np.where(labels == -100, tokenizer.pad_token_id, labels)
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
return chrf_bleu_geomean(decoded_preds, decoded_labels)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF