skills/nlp/dual-embedding-concat/SKILL.md
Concatenate GloVe and FastText embedding matrices along feature axis for richer 600d word representations
npx skillsauth add wenmin-wu/ds-skills nlp-dual-embedding-concatInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Single pretrained embeddings miss some word relationships — GloVe captures co-occurrence patterns while FastText handles subword morphology and OOV words. Concatenate both along the feature dimension to get a 600d embedding that combines both strengths. Especially useful for noisy text (social media, comments) where misspellings are common.
import numpy as np
def build_embedding_matrix(word_index, embedding_path, embed_dim=300):
"""Load pretrained embeddings into a matrix aligned with tokenizer vocab."""
embeddings = load_embeddings(embedding_path) # dict: word -> vector
matrix = np.zeros((len(word_index) + 1, embed_dim))
unknown = []
for word, idx in word_index.items():
if word in embeddings:
matrix[idx] = embeddings[word]
else:
unknown.append(word)
return matrix, unknown
# Build and concatenate
glove_matrix, unk_glove = build_embedding_matrix(
tokenizer.word_index, 'glove.840B.300d.txt')
fasttext_matrix, unk_ft = build_embedding_matrix(
tokenizer.word_index, 'crawl-300d-2M.vec')
embedding_matrix = np.concatenate([glove_matrix, fasttext_matrix], axis=-1)
# Shape: (vocab_size, 600)
# Use in model
embedding = nn.Embedding.from_pretrained(
torch.tensor(embedding_matrix, dtype=torch.float32), freeze=False)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF