skills/computer-vision-helper/SKILL.md
Assist with image analysis, object detection, and visual AI tasks
npx skillsauth add jmsktm/claude-settings Computer Vision HelperInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The Computer Vision Helper skill guides you through implementing image analysis and visual AI tasks. From basic image classification to complex object detection and segmentation, this skill helps you leverage modern computer vision techniques effectively.
Computer vision has been transformed by deep learning and now by vision-language models. This skill covers both traditional approaches (CNNs, pre-trained models) and cutting-edge techniques (CLIP, GPT-4V, Segment Anything). It helps you choose the right approach based on your accuracy requirements, available data, and deployment constraints.
Whether you are building product recognition, document analysis, medical imaging, or any visual AI application, this skill ensures you understand the landscape and implement solutions that work.
# Data loading with augmentation
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
dataset = ImageFolder(root='data/', transform=transform)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
# Transfer learning from pretrained model
model = models.resnet50(pretrained=True)
# Freeze early layers
for param in model.parameters():
param.requires_grad = False
# Replace classifier head
model.fc = nn.Linear(model.fc.in_features, num_classes)
class VisionPipeline:
def __init__(self, model_path):
self.model = load_optimized_model(model_path)
self.preprocessor = ImagePreprocessor()
def predict(self, image):
# Preprocess
tensor = self.preprocessor.process(image)
# Inference
with torch.no_grad():
output = self.model(tensor)
# Postprocess
return self.postprocess(output)
def predict_batch(self, images):
tensors = [self.preprocessor.process(img) for img in images]
batch = torch.stack(tensors)
with torch.no_grad():
outputs = self.model(batch)
return [self.postprocess(out) for out in outputs]
| Action | Command/Trigger | |--------|-----------------| | Choose approach | "What CV approach for [task]" | | Classify images | "Build image classifier" | | Detect objects | "Object detection for [use case]" | | Extract text | "OCR from images" | | Zero-shot vision | "Classify images without training data" | | Optimize model | "Speed up vision model" |
Start with Pre-trained: Don't train from scratch unless necessary
Data Quality Over Quantity: Clean, balanced data matters
Augment Thoughtfully: Augmentation should reflect real variation
Validate Correctly: Image data leaks easily
Optimize for Target Hardware: Inference matters
Handle Edge Cases: Real images are messy
Use CLIP for classification without training:
import clip
model, preprocess = clip.load("ViT-B/32")
def zero_shot_classify(image, labels):
# Prepare image
image_tensor = preprocess(image).unsqueeze(0)
# Prepare text prompts
text_prompts = [f"a photo of a {label}" for label in labels]
text_tokens = clip.tokenize(text_prompts)
# Get embeddings
with torch.no_grad():
image_features = model.encode_image(image_tensor)
text_features = model.encode_text(text_tokens)
# Compute similarities
similarities = (image_features @ text_features.T).softmax(dim=-1)
return {label: sim.item() for label, sim in zip(labels, similarities[0])}
Use multimodal LLMs for complex vision tasks:
def analyze_image(image_path, question):
import base64
from openai import OpenAI
# Encode image
with open(image_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode()
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": question},
{"type": "image_url", "image_url": {
"url": f"data:image/jpeg;base64,{image_data}"
}}
]
}],
max_tokens=500
)
return response.choices[0].message.content
Fast, accurate object detection:
from ultralytics import YOLO
# Load pretrained model
model = YOLO("yolov8n.pt")
# Fine-tune on custom dataset
model.train(
data="custom_dataset.yaml",
epochs=100,
imgsz=640,
batch=16
)
# Inference
results = model.predict(source="image.jpg", conf=0.5)
for result in results:
boxes = result.boxes
for box in boxes:
xyxy = box.xyxy[0].tolist() # Bounding box
conf = box.conf[0].item() # Confidence
cls = box.cls[0].item() # Class ID
print(f"Detected {cls} at {xyxy} with confidence {conf}")
Universal segmentation:
from segment_anything import sam_model_registry, SamPredictor
# Load SAM
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h.pth")
predictor = SamPredictor(sam)
# Set image
predictor.set_image(image)
# Segment with point prompt
masks, scores, logits = predictor.predict(
point_coords=np.array([[500, 375]]), # Click point
point_labels=np.array([1]), # 1 = foreground
multimask_output=True
)
# Segment with box prompt
masks, scores, logits = predictor.predict(
box=np.array([x1, y1, x2, y2])
)
Prepare models for production:
def optimize_for_deployment(model, sample_input):
# Step 1: Export to ONNX
torch.onnx.export(
model,
sample_input,
"model.onnx",
opset_version=13,
dynamic_axes={"input": {0: "batch"}}
)
# Step 2: Quantize (INT8)
from onnxruntime.quantization import quantize_dynamic
quantize_dynamic(
"model.onnx",
"model_quantized.onnx",
weight_type=QuantType.QInt8
)
# Step 3: Benchmark
import onnxruntime as ort
session = ort.InferenceSession("model_quantized.onnx")
benchmark_inference(session, sample_input)
return "model_quantized.onnx"
data-ai
Optimize YouTube videos for SEO, thumbnails, descriptions, and audience retention
testing
Design and facilitate effective workshops with agendas, activities, and outcomes
data-ai
Design and optimize AI-powered workflows for complex tasks
data-ai
Design and implement automated workflows to eliminate repetitive tasks and streamline processes