skills/cv/cnn-encoder-spatial-feature-map/SKILL.md
Strips the global pool and FC head from a pretrained CNN to expose spatial feature maps (H x W x C) for attention-based decoding.
npx skillsauth add wenmin-wu/ds-skills cv-cnn-encoder-spatial-feature-mapInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For image-to-sequence tasks (captioning, OCR, molecular translation), the CNN must output spatial feature maps rather than a single vector. Replace the global pooling and FC head with nn.Identity(), then permute/reshape the output to (batch, H*W, C). Each spatial position becomes an "input token" the decoder can attend to.
import timm
import torch.nn as nn
class SpatialEncoder(nn.Module):
def __init__(self, model_name="resnet34", pretrained=True):
super().__init__()
self.cnn = timm.create_model(model_name, pretrained=pretrained)
self.n_features = self.cnn.fc.in_features
self.cnn.global_pool = nn.Identity()
self.cnn.fc = nn.Identity()
def forward(self, x):
features = self.cnn(x) # (B, C, H, W)
features = features.permute(0, 2, 3, 1) # (B, H, W, C)
B, H, W, C = features.shape
features = features.view(B, H * W, C) # (B, num_pixels, C)
return features
global_pool and fc with nn.Identity()nn.AdaptiveAvgPool2d((H, W)) before flatten for fixed spatial dimsdata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF