skills/nlp/jaccard-fbeta-ner-metric/SKILL.md
Computes micro F-beta for NER where true positives are determined by Jaccard word-overlap >= threshold rather than exact string match.
npx skillsauth add wenmin-wu/ds-skills nlp-jaccard-fbeta-ner-metricInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Exact string match is too strict for NER evaluation — "National Health Interview Survey" vs "National Health Interview Survey (NHIS)" should count as a match. Jaccard F-beta uses word-level Jaccard similarity with a threshold (e.g., 0.5) to determine true positives, then computes micro F-beta. Beta < 1 (e.g., 0.5) penalizes false positives more than false negatives.
import numpy as np
def jaccard_similarity(s1, s2):
w1, w2 = set(s1.lower().split()), set(s2.lower().split())
return len(w1 & w2) / len(w1 | w2) if w1 | w2 else 0.0
def jaccard_fbeta(y_true, y_pred, beta=0.5, jaccard_thresh=0.5):
tp = fp = fn = 0
for gt_list, pred_list in zip(y_true, y_pred):
pred_remaining = list(pred_list)
for gt in gt_list:
scores = [jaccard_similarity(gt, p) for p in pred_remaining]
if scores:
best_idx = int(np.argmax(scores))
if scores[best_idx] >= jaccard_thresh:
pred_remaining.pop(best_idx)
tp += 1
else:
fn += 1
else:
fn += 1
fp += len(pred_remaining)
b2 = beta ** 2
return tp * (1 + b2) / (tp * (1 + b2) + fp + fn * b2) if tp > 0 else 0.0
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF