skills/cv/two-stage-bce-then-lovasz/SKILL.md
Train segmentation with BCE loss first for stable convergence, then fine-tune with Lovasz-hinge on raw logits for IoU-optimal predictions
npx skillsauth add wenmin-wu/ds-skills cv-two-stage-bce-then-lovaszInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
BCE loss is smooth and stable for early training but doesn't directly optimize IoU. Lovász-hinge loss is a convex surrogate that directly optimizes IoU but can be unstable from scratch. The two-stage approach trains with BCE first to learn good features, then strips the sigmoid activation and fine-tunes with Lovász-hinge on raw logits. At inference, thresholds must be converted to logit space via the inverse sigmoid.
from keras.models import load_model, Model
# Stage 1: BCE training
model.compile(loss='binary_crossentropy', optimizer=Adam(1e-3))
model.fit(X_train, y_train, epochs=50,
callbacks=[ModelCheckpoint('stage1.h5', save_best_only=True)])
# Stage 2: strip sigmoid, switch to Lovász
model = load_model('stage1.h5')
logit_output = model.layers[-1].input # layer before sigmoid
model2 = Model(model.input, logit_output)
model2.compile(loss=lovasz_loss, optimizer=Adam(1e-4))
model2.fit(X_train, y_train, epochs=30)
# Inference: threshold in logit space
import numpy as np
prob_threshold = 0.5
logit_threshold = np.log(prob_threshold / (1 - prob_threshold))
preds = (model2.predict(X_test) > logit_threshold).astype(np.uint8)
logit = log(p / (1-p))np.linspace(0.3, 0.7, 31) in probability space, convert each to logitdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF