skills/cv/differentiable-soft-f1-loss/SKILL.md
Use a soft macro-F1 loss `1 − mean(2·tp / (2·tp + fp + fn))` computed from raw sigmoid probabilities (no thresholding) as a direct training objective for multi-label classification, optionally combined with BCE — closes the gap between training surrogate and the F1 metric the leaderboard scores
npx skillsauth add wenmin-wu/ds-skills cv-differentiable-soft-f1-lossInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When the eval metric is macro-F1, BCE is a poor proxy: it weights every label equally regardless of base rate, and it doesn't push the model to actually pass a threshold. The fix is a differentiable macro-F1 surrogate computed directly from sigmoid probabilities — no thresholding, no rounding. Sum probability-weighted TP/FP/FN over the batch, compute per-class F1, average, return 1 − mean_F1. The loss is well-behaved as long as you mask NaNs from empty classes. In practice it is most effective as a co-loss (α·BCE + (1−α)·F1_soft) — pure F1 loss has weak gradients early when no class is being predicted at all; BCE bootstraps confidence and F1 sharpens it.
import keras.backend as K
import tensorflow as tf
def f1_loss(y_true, y_pred):
tp = K.sum(y_true * y_pred, axis=0)
fp = K.sum((1 - y_true) * y_pred, axis=0)
fn = K.sum(y_true * (1 - y_pred), axis=0)
p = tp / (tp + fp + K.epsilon())
r = tp / (tp + fn + K.epsilon())
f1 = 2 * p * r / (p + r + K.epsilon())
f1 = tf.where(tf.is_nan(f1), tf.zeros_like(f1), f1)
return 1 - K.mean(f1)
def combo_loss(alpha=0.5):
bce = tf.keras.losses.BinaryCrossentropy()
return lambda y, p: alpha * bce(y, p) + (1 - alpha) * f1_loss(y, p)
model.compile(optimizer='adam', loss=combo_loss(0.5))
tf.where(tf.is_nan(f1), 0, f1)alpha to 0 to fully optimize F1data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF