skills/nlp/transformer-layer-freezing/SKILL.md
Freezes transformer embedding and lower encoder layers to reduce memory, speed up training, and stabilize fine-tuning.
npx skillsauth add wenmin-wu/ds-skills nlp-transformer-layer-freezingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Lower transformer layers learn general language features that transfer well; upper layers are task-specific. Freeze embeddings and the first N encoder layers to skip gradient computation on stable parameters. This reduces GPU memory, speeds up training, and prevents catastrophic forgetting — especially useful when fine-tuning large models on small datasets.
from transformers import AutoModel
model = AutoModel.from_pretrained("microsoft/deberta-v3-large")
def freeze(module):
for param in module.parameters():
param.requires_grad = False
# Freeze embeddings + first 2 encoder layers
freeze(model.embeddings)
freeze(model.encoder.layer[:2])
# Only pass trainable params to optimizer
trainable_params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.AdamW(trainable_params, lr=2e-5)
model.embeddings (word, position, token_type)model.encoder.layer[:N]requires_grad=True parameters for optimizerdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF