skills/nlp/gpu-tfidf-cosine-matching/SKILL.md
GPU-accelerated TF-IDF vectorization via RAPIDS cuML with chunked cosine similarity for large-scale text matching
npx skillsauth add wenmin-wu/ds-skills nlp-gpu-tfidf-cosine-matchingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For large-scale text matching (>100K documents), CPU-based TF-IDF + cosine similarity is too slow. Use RAPIDS cuML's TfidfVectorizer on GPU, convert sparse output to dense, then compute chunked cosine similarity with CuPy. Achieves 10-50x speedup over scikit-learn on text retrieval tasks.
import cudf
import cupy
from cuml.feature_extraction.text import TfidfVectorizer
from cuml.neighbors import NearestNeighbors
def gpu_tfidf_match(texts, threshold=0.7, max_features=25000, chunk_size=4096):
"""Find text matches via GPU TF-IDF + cosine similarity.
Args:
texts: cudf Series or list of strings
threshold: cosine similarity cutoff
max_features: TF-IDF vocabulary size
chunk_size: rows per similarity batch
Returns:
list of matched index arrays per document
"""
tfidf = TfidfVectorizer(stop_words='english', binary=True,
max_features=max_features)
embeddings = tfidf.fit_transform(texts).toarray() # dense on GPU
N = len(texts)
n_chunks = (N + chunk_size - 1) // chunk_size
matches = [None] * N
for j in range(n_chunks):
a = j * chunk_size
b = min(a + chunk_size, N)
sims = cupy.matmul(embeddings, embeddings[a:b].T).T
for k in range(b - a):
idx = cupy.where(sims[k] > threshold)[0]
matches[a + k] = cupy.asnumpy(idx)
return matches
# Usage
df_cu = cudf.from_pandas(df)
text_matches = gpu_tfidf_match(df_cu['title'], threshold=0.7)
.toarray() needed for matmuldata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF