skills/cv/embedding-knn-regression/SKILL.md
GPU-accelerated k-NN regression on CLIP image embeddings using cosine distance and inverse-distance-power weighting to predict target embedding vectors
npx skillsauth add wenmin-wu/ds-skills cv-embedding-knn-regressionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When mapping images to target embeddings (e.g., prompt embeddings), k-NN regression in the CLIP embedding space is a strong non-parametric baseline. Compute cosine distances between test image embeddings and a reference set, find the k nearest neighbors, then weight their target embeddings by inverse distance raised to a power. GPU acceleration with batched matrix multiply makes this fast even for 100k+ reference points.
import torch
import numpy as np
def knn_predict(ref_x, ref_y, test_x, k=100, power=6, batch=1000):
ref_x = torch.from_numpy(ref_x).cuda()
ref_x /= ref_x.norm(dim=-1, keepdim=True)
ref_y = torch.from_numpy(ref_y).cuda()
preds = []
for i in range(0, len(test_x), batch):
batch_x = torch.from_numpy(test_x[i:i+batch]).cuda()
batch_x /= batch_x.norm(dim=-1, keepdim=True)
dists = 1 - batch_x @ ref_x.T # cosine distance
topk_dists, topk_idx = dists.topk(k, largest=False)
weights = 1.0 / (topk_dists ** power + 1e-8)
weights /= weights.sum(dim=-1, keepdim=True)
pred = (weights.unsqueeze(-1) * ref_y[topk_idx]).sum(dim=1)
preds.append(pred.cpu().numpy())
return np.concatenate(preds)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF