skills/cv/clip-interrogator-captioning/SKILL.md
Generate descriptive text prompts from images by combining BLIP captioning with CLIP cosine similarity against curated label banks for medium, movement, and flavor attributes
npx skillsauth add wenmin-wu/ds-skills cv-clip-interrogator-captioningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When you need to reverse-engineer or describe an image as a text prompt (e.g., for image-to-prompt tasks), CLIP Interrogator combines two models: BLIP generates a base caption, then CLIP matches the image embedding against precomputed text embeddings from curated label banks (mediums, movements, flavors). The top-matching labels are appended to the caption, producing a rich prompt that captures style, medium, and content.
import torch
from clip_interrogator import Config, Interrogator
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
cos = torch.nn.CosineSimilarity(dim=1)
mediums_features = torch.stack([torch.from_numpy(t) for t in ci.mediums.embeds]).to(ci.device)
movements_features = torch.stack([torch.from_numpy(t) for t in ci.movements.embeds]).to(ci.device)
flavors_features = torch.stack([torch.from_numpy(t) for t in ci.flavors.embeds]).to(ci.device)
def interrogate(image):
caption = ci.generate_caption(image)
feat = ci.image_to_features(image)
medium = ci.mediums.labels[cos(feat, mediums_features).topk(1).indices[0]]
movement = ci.movements.labels[cos(feat, movements_features).topk(1).indices[0]]
flavors = ", ".join([ci.flavors.labels[i] for i in cos(feat, flavors_features).topk(3).indices])
return f"{caption}, {medium}, {movement}, {flavors}"
clip_interrogatordata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF