skills/nlp/embedding-matrix-oov-gaussian-init/SKILL.md
Initialize out-of-vocabulary word embeddings with Gaussian noise matching the pretrained embedding distribution
npx skillsauth add wenmin-wu/ds-skills nlp-embedding-matrix-oov-gaussian-initInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When building an embedding matrix from pretrained vectors, words not found in the pretrained vocab default to zero. This pulls OOV tokens to the origin of embedding space — very different from real tokens — and biases downstream models. Instead, sample OOV rows from a Gaussian with the same mean and std as the pretrained embeddings. OOV words then start in the same vector space and the model can learn to distinguish them naturally.
import numpy as np
def build_embedding_matrix(word_index, embeddings_index, max_features, embed_size):
# Compute mean/std over pretrained vectors
all_embs = np.stack(list(embeddings_index.values()))
emb_mean, emb_std = all_embs.mean(), all_embs.std()
nb_words = min(max_features, len(word_index))
# Pre-fill with in-distribution noise
embedding_matrix = np.random.normal(emb_mean, emb_std,
(nb_words, embed_size))
for word, i in word_index.items():
if i >= max_features:
continue
vector = embeddings_index.get(word)
if vector is not None:
embedding_matrix[i] = vector
return embedding_matrix
emb_mean and emb_stdN(emb_mean, emb_std) noisenn.Embedding / Keras Embedding with weights=[matrix]data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF