skills/nlp/8bit-optimizer-embedding-override/SKILL.md
Uses bitsandbytes 8-bit AdamW to halve optimizer memory, with a 32-bit override for embedding weights to prevent instability.
npx skillsauth add wenmin-wu/ds-skills nlp-8bit-optimizer-embedding-overrideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Optimizer states (momentum + variance in Adam) consume 2x the model size. BitsAndBytes 8-bit AdamW quantizes these states to 8-bit, cutting optimizer memory by ~75%. However, embedding layers are sensitive to quantization noise. Override embedding weights to use 32-bit optimizer states while keeping everything else at 8-bit.
import bitsandbytes as bnb
def set_embedding_parameters_bits(embeddings, optim_bits=32):
for attr in ("word_embeddings", "position_embeddings", "token_type_embeddings"):
if hasattr(embeddings, attr):
bnb.optim.GlobalOptimManager.get_instance().register_module_override(
getattr(embeddings, attr), "weight", {"optim_bits": optim_bits}
)
model = AutoModel.from_pretrained("microsoft/deberta-v3-base")
trainable_params = [p for p in model.parameters() if p.requires_grad]
optimizer = bnb.optim.AdamW(trainable_params, lr=2e-5, optim_bits=8)
set_embedding_parameters_bits(model.embeddings)
pip install bitsandbytes)torch.optim.AdamW with bnb.optim.AdamW(optim_bits=8)GlobalOptimManagerdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF