skills/cv/per-patient-slice-maxpool-aggregation/SKILL.md
Aggregate per-slice CNN predictions into a single patient-level injury score by mean-pooling across TTA copies first, then max-pooling across slices — the worst-slice wins, which matches the medical reality that one bad slice is enough to grade the patient
npx skillsauth add wenmin-wu/ds-skills cv-per-patient-slice-maxpool-aggregationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For per-patient binary or grade tasks (injury yes/no, severity 0-2-3) where the label depends on whether the worst slice is bad, average-pooling across slices washes out the signal: a single high-confidence injury slice gets diluted by 50 healthy ones. The right reduction is max across slices — but only after you've already averaged across TTA / multi-model copies, otherwise you collapse the noise floor. The order matters: mean(TTA) → max(slices) is the right composition; max(everything) overstates probability and mean(everything) understates it.
import numpy as np
def aggregate_patient(slice_preds, n_tta):
"""
slice_preds: (n_tta * n_slices, n_outputs) raw model outputs
Returns: (n_outputs,) patient-level score
"""
n_slices = slice_preds.shape[0] // n_tta
pred = slice_preds.reshape(n_tta, n_slices, -1)
pred = pred.mean(axis=0) # 1. average TTA copies per slice
pred = pred.max(axis=0) # 2. worst slice across the volume
return pred
patient_scores = np.zeros((len(patient_ids), 11), dtype='float32')
for i, pid in enumerate(patient_ids):
pdf = test_df.query('patient_id == @pid')
raw = model.predict(build_dataset(pdf.image_path.tolist()))
raw = np.concatenate(raw, axis=-1) # multi-head → flat
patient_scores[i] = aggregate_patient(raw, n_tta=4)
(N * S, C) predictions(N, S, C) and average over the TTA axis to denoise per-slice scores[0.01, 0.99] if the metric penalizes log-loss extremesdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF