skills/nlp/quantized-probability-storage/SKILL.md
Stores model softmax probabilities as uint8 (0-255) to reduce RAM by 4x during multi-model ensemble inference.
npx skillsauth add wenmin-wu/ds-skills nlp-quantized-probability-storageInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Ensembling multiple large transformer models requires storing all predictions in memory before averaging. With float32 probabilities, 5 models x 10K documents x 4096 tokens x 15 classes = ~12 GB. Quantizing to uint8 (0-255) reduces this to ~3 GB with negligible accuracy loss. Probabilities are multiplied by 255, cast to byte, stored, then divided by 255 when needed for averaging.
import numpy as np
import torch
def quantize_probs(probs, max_length=4096):
"""Convert float32 probabilities to uint8 for storage.
Args:
probs: (batch, seq_len, n_classes) float tensor
Returns:
(batch, max_length, n_classes) uint8 numpy array
"""
quantized = (probs * 255).byte().cpu().numpy()
# Pad or truncate to fixed length
if quantized.shape[1] > max_length:
quantized = quantized[:, :max_length, :]
elif quantized.shape[1] < max_length:
pad_width = ((0, 0), (0, max_length - quantized.shape[1]), (0, 0))
quantized = np.pad(quantized, pad_width, constant_values=0)
return quantized
def dequantize_and_average(all_preds, n_models):
"""Average quantized predictions from multiple models."""
result = np.zeros_like(all_preds[0], dtype=np.float32)
for preds in all_preds:
result += preds.astype(np.float32) / 255.0
return result / n_models
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF