skills/cv/bigru-slice-feature-aggregator/SKILL.md
Two-stage CT classifier where a 2D CNN dumps per-slice features once, then a bidirectional GRU runs over the slice sequence to produce both per-slice predictions (TimeDistributed head) and an exam-level prediction (avg+max pooled head) — turns expensive 3D CNN training into cheap sequence modeling
npx skillsauth add wenmin-wu/ds-skills cv-bigru-slice-feature-aggregatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Volumetric CT classification with a true 3D CNN is expensive: every epoch you re-encode the same slices that haven't changed. The cheaper, often-better alternative is two-stage. Stage 1 trains a 2D CNN on slices, then dumps a fixed-size feature vector per slice once and freezes. Stage 2 trains a tiny bidirectional GRU over the per-slice feature sequence, with two heads: a TimeDistributed Linear that produces per-slice predictions and a cat(avg_pool, max_pool) Linear that produces the exam-level prediction. Adding the inter-slice Z-gap as an extra input feature gives the GRU spatial context. Stage 2 is so cheap you can sweep dozens of hyperparameters in the time stage 1 takes for one epoch.
import torch
import torch.nn as nn
class TimeDistributed(nn.Module):
def __init__(self, layer): super().__init__(); self.layer = layer
def forward(self, x): # (B, T, F) -> (B, T, F_out)
B, T, F = x.shape
return self.layer(x.reshape(B * T, F)).reshape(B, T, -1)
class SliceGRU(nn.Module):
def __init__(self, n_feats, hidden=64, n_exam_targets=9):
super().__init__()
self.gru = nn.GRU(
n_feats + 1, # +1 for inter-slice z-gap
hidden,
num_layers=2,
bidirectional=True,
batch_first=True,
)
self.image_head = TimeDistributed(nn.Linear(hidden * 2, 1))
self.exam_head = nn.Linear(hidden * 2 * 2, n_exam_targets)
def forward(self, slice_feats, z_gaps):
x = torch.cat([slice_feats, z_gaps.unsqueeze(-1)], dim=2)
h, _ = self.gru(x) # (B, T, 2H)
per_slice = self.image_head(h)
avg = h.mean(dim=1)
mx, _ = h.max(dim=1)
per_exam = self.exam_head(torch.cat([avg, mx], dim=1))
return per_slice, per_exam
(num_slices, n_feats) to disk as a single .npyImagePositionPatient[2] deltas; first slice gets gap = 0cat(mean_pool, max_pool) for the exam-level head — single-pool is consistently worsedata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF