skills/cv/vqa-yes-no-logit-scoring/SKILL.md
Extracts calibrated yes/no probabilities from a VQA model by masking all logits except yes/no token variants and renormalizing via softmax.
npx skillsauth add wenmin-wu/ds-skills cv-vqa-yes-no-logit-scoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Visual Question Answering models can score image-text alignment by asking "Does this image match [description]? Answer yes or no." But taking argmax gives a binary answer, not a confidence score. This technique masks all vocabulary logits except the yes/no tokens (including space-prefixed variants like yes), applies softmax over just those 4 tokens, then sums the yes-variants to get a calibrated probability. This converts any VQA model into a continuous image-text scorer without fine-tuning.
import torch
class VQAScorer:
def __init__(self, model, processor):
self.model = model
self.processor = processor
tok = processor.tokenizer
self.yes_ids = [tok.convert_tokens_to_ids(t) for t in ['yes', ' yes']]
self.no_ids = [tok.convert_tokens_to_ids(t) for t in ['no', ' no']]
@torch.no_grad()
def score(self, image, question):
inputs = self.processor(images=image, text=question, return_tensors='pt').to('cuda')
logits = self.model(**inputs).logits[:, -1, :]
mask = torch.full_like(logits, float('-inf'))
for tid in self.yes_ids + self.no_ids:
mask[:, tid] = logits[:, tid]
probs = torch.softmax(mask, dim=-1)
p_yes = sum(probs[0, tid].item() for tid in self.yes_ids)
p_no = sum(probs[0, tid].item() for tid in self.no_ids)
return p_yes / (p_yes + p_no)
scorer = VQAScorer(model, processor)
score = scorer.score(image, f'Does this image show "{desc}"? Answer yes or no.')
yes and yes (space-prefixed) — tokenizers handle them differentlydata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF