skills/nlp/map-at-k-metric/SKILL.md
Custom MAP@K evaluation metric — scores top-K classification predictions with reciprocal rank weighting for HuggingFace Trainer
npx skillsauth add wenmin-wu/ds-skills nlp-map-at-k-metricInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Mean Average Precision at K (MAP@K) scores multi-class predictions by checking if the true label appears in the top-K predictions. The score is 1/rank if found (1.0 for rank 1, 0.5 for rank 2, etc.), 0 if not in top-K. Averages across all samples. Standard metric for classification-as-ranking tasks.
import numpy as np
import torch
def compute_map_at_k(eval_pred, k=3):
"""MAP@K metric compatible with HuggingFace Trainer.
Args:
eval_pred: (logits, labels) tuple from Trainer
k: number of top predictions to consider
"""
logits, labels = eval_pred
probs = torch.nn.functional.softmax(
torch.tensor(logits, dtype=torch.float32), dim=-1
).numpy()
top_k = np.argsort(-probs, axis=1)[:, :k]
score = 0.0
for i in range(len(labels)):
matches = np.where(top_k[i] == labels[i])[0]
if len(matches) > 0:
score += 1.0 / (matches[0] + 1) # reciprocal rank
return {f"map@{k}": score / len(labels)}
# Usage with HuggingFace Trainer
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=lambda ep: compute_map_at_k(ep, k=3),
)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF