skills/timeseries/gaussian-log-likelihood-metric/SKILL.md
Evaluate probabilistic forecasts using normalized Gaussian log-likelihood relative to naive and oracle baselines, scoring both mean accuracy and uncertainty calibration
npx skillsauth add wenmin-wu/ds-skills timeseries-gaussian-log-likelihood-metricInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For regression tasks requiring uncertainty estimates, score predictions using Gaussian log-likelihood normalized between a naive baseline and an oracle. The metric rewards both accurate mean predictions AND well-calibrated uncertainty (sigma). Overconfident predictions (small sigma, wrong mean) are penalized heavily. Score ranges from 0 (= naive) to 1 (= oracle).
import numpy as np
from scipy.stats import norm
def gaussian_log_likelihood_score(y_true, y_pred, sigma_pred,
naive_mean, naive_sigma, sigma_true):
"""Normalized Gaussian log-likelihood metric.
Args:
y_true: ground truth values (n_samples, n_targets)
y_pred: predicted means (n_samples, n_targets)
sigma_pred: predicted uncertainties (n_samples, n_targets)
naive_mean: baseline mean prediction (n_targets,)
naive_sigma: baseline uncertainty (n_targets,)
sigma_true: oracle uncertainty (n_samples, n_targets)
Returns:
score in [0, 1] where 1 is perfect
"""
sigma_pred = np.clip(sigma_pred, 1e-15, None)
gll_pred = np.sum(norm.logpdf(y_true, loc=y_pred, scale=sigma_pred))
gll_oracle = np.sum(norm.logpdf(y_true, loc=y_true, scale=sigma_true))
gll_naive = np.sum(norm.logpdf(y_true, loc=naive_mean, scale=naive_sigma))
score = (gll_pred - gll_naive) / (gll_oracle - gll_naive)
return float(np.clip(score, 0.0, 1.0))
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF