skills/data-and-science/research/scientific-skills/proposal-cluster-learning/SKILL.md
--- name: proposal-cluster-learning description: Implement Proposal Cluster Learning (PCL) for Weakly Supervised Object Detection (WSOD). This skill implements the methodology from the IEEE TPAMI paper "PCL: Proposal Cluster Learning for Weakly Supervised Object Detection" by Tang et al. Enables training object detectors using only image-level labels without bounding box annotations. allowed-tools: [Read, Write, Edit, Bash] license: MIT license metadata: skill-author: Adapted from Tang et al
npx skillsauth add lunartech-x/superpowers skills/data-and-science/research/scientific-skills/proposal-cluster-learningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Proposal Cluster Learning (PCL) is an end-to-end deep network approach for Weakly Supervised Object Detection (WSOD). It allows training object detectors using only image-level labels (e.g., "this image contains a dog") without requiring expensive bounding box annotations.
Key Innovation: Instead of treating detection as classification (like standard MIL approaches), PCL generates "proposal clusters" - groups of spatially adjacent proposals associated with the same object - and uses these clusters to iteratively refine instance classifiers.
Benefits:
Use this skill when:
Traditional Multiple Instance Learning (MIL) for WSOD:
Image
│
▼
┌─────────────────────────────────────────────────────────┐
│ CNN Backbone (VGG16) │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Region Proposal Network (RPN) │
│ or Selective Search │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ ROI Pooling Layer │
└─────────────────────────────────────────────────────────┘
│
├──────► Stream 1: MIL Network (Initial Classification)
│
├──────► Stream 2: PCL Refinement 1
│
├──────► Stream 3: PCL Refinement 2
│
└──────► Stream K: PCL Refinement K-1
Prepare Image-Level Labels:
# Dataset format: image path + list of classes present
dataset = {
"image_001.jpg": ["dog", "person"],
"image_002.jpg": ["car"],
"image_003.jpg": ["dog", "cat"],
# ...
}
Generate Region Proposals:
def generate_proposals(image, method="selective_search"):
"""
Generate region proposals for each image
Options:
- Selective Search (traditional)
- Edge Boxes
- RPN (if using two-stage approach)
"""
if method == "selective_search":
import cv2
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
ss.setBaseImage(image)
ss.switchToSelectiveSearchFast()
proposals = ss.process()
# Typically use top 2000 proposals
return proposals[:2000]
Feature Extraction:
class MILNetwork(nn.Module):
def __init__(self, num_classes, backbone='vgg16'):
super().__init__()
self.backbone = load_pretrained_backbone(backbone)
self.roi_pool = ROIPool(output_size=(7, 7))
# Two parallel branches
self.fc_cls = nn.Linear(4096, num_classes) # Classification
self.fc_det = nn.Linear(4096, num_classes) # Detection
def forward(self, image, proposals):
# Extract features
features = self.backbone(image)
# ROI pooling for each proposal
roi_features = self.roi_pool(features, proposals)
# Classification scores (image-level)
cls_scores = F.softmax(self.fc_cls(roi_features), dim=0)
# Detection scores (proposal-level)
det_scores = F.softmax(self.fc_det(roi_features), dim=1)
# Combine: proposal score = cls * det
proposal_scores = cls_scores * det_scores
return proposal_scores
MIL Loss:
def mil_loss(proposal_scores, image_labels):
"""
Image-level classification loss:
Aggregate proposal scores to image-level prediction
"""
# Sum over proposals for each class
image_scores = proposal_scores.sum(dim=0)
# Binary cross-entropy with image labels
loss = F.binary_cross_entropy(
torch.sigmoid(image_scores),
image_labels
)
return loss
Generate Proposal Clusters:
def generate_proposal_clusters(proposals, proposal_scores, iou_threshold=0.5):
"""
Group proposals into clusters based on:
1. Spatial overlap (IoU)
2. Score similarity
"""
clusters = []
# For each class
for c in range(num_classes):
class_scores = proposal_scores[:, c]
# Find high-scoring proposals
high_scoring = proposals[class_scores > 0.1]
# Cluster by spatial overlap
cluster_assignments = cluster_by_iou(
high_scoring,
iou_threshold=iou_threshold
)
for cluster_id in np.unique(cluster_assignments):
cluster_proposals = high_scoring[cluster_assignments == cluster_id]
clusters.append({
'class': c,
'proposals': cluster_proposals,
'center': compute_cluster_center(cluster_proposals)
})
return clusters
Assign Labels from Clusters:
def assign_cluster_labels(proposals, clusters):
"""
Assign pseudo-labels to proposals based on clusters:
- Proposals in object cluster → object label
- Other proposals → background
"""
labels = np.zeros(len(proposals)) # Default: background
for cluster in clusters:
for proposal in cluster['proposals']:
idx = find_proposal_index(proposal, proposals)
labels[idx] = cluster['class']
return labels
Refinement Network:
class PCLRefinementStream(nn.Module):
def __init__(self, num_classes):
super().__init__()
self.fc1 = nn.Linear(4096, 4096)
self.fc2 = nn.Linear(4096, num_classes + 1) # +1 for background
def forward(self, roi_features, cluster_labels):
x = F.relu(self.fc1(roi_features))
scores = self.fc2(x)
# Supervised by cluster-generated pseudo-labels
loss = F.cross_entropy(scores, cluster_labels)
return scores, loss
Iterative Refinement:
def train_pcl(images, labels, num_refinement_streams=3):
"""
Train PCL with multiple refinement streams
"""
model = PCLNetwork(num_classes, num_refinement_streams)
for epoch in range(num_epochs):
for image, label in dataloader:
# Generate proposals
proposals = generate_proposals(image)
# Extract ROI features
roi_features = model.extract_features(image, proposals)
# Stream 1: MIL
mil_scores = model.mil_stream(roi_features)
mil_loss = compute_mil_loss(mil_scores, label)
# Generate clusters from MIL output
clusters = generate_proposal_clusters(proposals, mil_scores)
# Refinement streams
total_loss = mil_loss
current_scores = mil_scores
for stream_idx in range(num_refinement_streams):
# Assign labels from clusters
pseudo_labels = assign_cluster_labels(proposals, clusters)
# Refine
refined_scores, refine_loss = model.refinement_streams[stream_idx](
roi_features, pseudo_labels
)
total_loss += refine_loss
# Update clusters for next stream
clusters = generate_proposal_clusters(proposals, refined_scores)
current_scores = refined_scores
# Backprop
total_loss.backward()
optimizer.step()
def detect_objects(model, image, score_threshold=0.5, nms_threshold=0.3):
"""
Run inference to detect objects
"""
proposals = generate_proposals(image)
roi_features = model.extract_features(image, proposals)
# Use final refinement stream for detection
final_scores = model.final_stream(roi_features)
# Apply NMS per class
detections = []
for c in range(num_classes):
class_scores = final_scores[:, c]
high_scoring = class_scores > score_threshold
if high_scoring.any():
boxes = proposals[high_scoring]
scores = class_scores[high_scoring]
# Non-maximum suppression
keep = nms(boxes, scores, nms_threshold)
for idx in keep:
detections.append({
'class': c,
'box': boxes[idx],
'score': scores[idx]
})
return detections
The key advantage of PCL is detecting complete objects, not just discriminative parts:
| Parameter | Typical Value | Notes | |-----------|--------------|-------| | Proposals per image | 2000 | Top-K from proposal method | | Refinement streams | 3 | More streams = better but slower | | IoU threshold (clustering) | 0.4-0.5 | Lower = larger clusters | | Learning rate | 0.001 | With decay | | Batch size | 2 | Limited by GPU memory |
# Typical training schedule
lr_schedule = {
0: 0.001, # Initial LR
40000: 0.0001, # Decay at 40k iterations
70000: 0.00001 # Final decay
}
total_iterations = 80000
Based on original paper (PASCAL VOC 2007):
| Method | mAP | |--------|-----| | Standard MIL | 39.3% | | PCL (3 streams) | 48.8% | | PCL + Regression | 52.2% |
# Deep learning
pip install torch torchvision
# Image processing
pip install opencv-python pillow
# Proposals (selective search)
pip install opencv-contrib-python
# Evaluation
pip install pycocotools
tools
Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.
testing
Access AlphaFold 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.
development
Access real-time and historical stock market data, forex rates, cryptocurrency prices, commodities, economic indicators, and 50+ technical indicators via the Alpha Vantage API. Use when fetching stock prices (OHLCV), company fundamentals (income statement, balance sheet, cash flow), earnings, options data, market news/sentiment, insider transactions, GDP, CPI, treasury yields, gold/silver/oil prices, Bitcoin/crypto prices, forex exchange rates, or calculating technical indicators (SMA, EMA, MACD, RSI, Bollinger Bands). Requires a free API key from alphavantage.co.
development
This skill should be used for time series machine learning tasks including classification, regression, clustering, forecasting, anomaly detection, segmentation, and similarity search. Use when working with temporal data, sequential patterns, or time-indexed observations requiring specialized algorithms beyond standard ML approaches. Particularly suited for univariate and multivariate time series analysis with scikit-learn compatible APIs.