Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

wenmin-wu/cv-bigru-slice-feature-aggregator

Name: cv-bigru-slice-feature-aggregator
Author: wenmin-wu

skills/cv/bigru-slice-feature-aggregator/SKILL.md

npx skillsauth add wenmin-wu/ds-skills cv-bigru-slice-feature-aggregator

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Overview

Volumetric CT classification with a true 3D CNN is expensive: every epoch you re-encode the same slices that haven't changed. The cheaper, often-better alternative is two-stage. Stage 1 trains a 2D CNN on slices, then dumps a fixed-size feature vector per slice once and freezes. Stage 2 trains a tiny bidirectional GRU over the per-slice feature sequence, with two heads: a TimeDistributed Linear that produces per-slice predictions and a cat(avg_pool, max_pool) Linear that produces the exam-level prediction. Adding the inter-slice Z-gap as an extra input feature gives the GRU spatial context. Stage 2 is so cheap you can sweep dozens of hyperparameters in the time stage 1 takes for one epoch.

Quick Start

import torch
import torch.nn as nn

class TimeDistributed(nn.Module):
    def __init__(self, layer): super().__init__(); self.layer = layer
    def forward(self, x):  # (B, T, F) -> (B, T, F_out)
        B, T, F = x.shape
        return self.layer(x.reshape(B * T, F)).reshape(B, T, -1)

class SliceGRU(nn.Module):
    def __init__(self, n_feats, hidden=64, n_exam_targets=9):
        super().__init__()
        self.gru = nn.GRU(
            n_feats + 1,         # +1 for inter-slice z-gap
            hidden,
            num_layers=2,
            bidirectional=True,
            batch_first=True,
        )
        self.image_head = TimeDistributed(nn.Linear(hidden * 2, 1))
        self.exam_head  = nn.Linear(hidden * 2 * 2, n_exam_targets)

    def forward(self, slice_feats, z_gaps):
        x = torch.cat([slice_feats, z_gaps.unsqueeze(-1)], dim=2)
        h, _ = self.gru(x)                      # (B, T, 2H)
        per_slice = self.image_head(h)
        avg = h.mean(dim=1)
        mx, _ = h.max(dim=1)
        per_exam = self.exam_head(torch.cat([avg, mx], dim=1))
        return per_slice, per_exam

Workflow

Train a 2D CNN end-to-end on per-slice classification (or load a pretrained backbone)
For every CT series, run the 2D CNN once and dump (num_slices, n_feats) to disk as a single .npy
Compute the inter-slice Z gap from ImagePositionPatient[2] deltas; first slice gets gap = 0
Train the GRU with both losses summed: per-slice BCE (with masking for padding) + per-exam BCE
Use cat(mean_pool, max_pool) for the exam-level head — single-pool is consistently worse
Keep the GRU tiny (hidden=64, 2 layers) — it's a sequence aggregator, not a feature extractor

Key Decisions

Freeze stage 1 before dumping: any backbone update invalidates the feature cache; freeze + dump + train stage 2 is the right order.
Bidirectional, not unidirectional: PE / lesion / nodule context is symmetric; left-right context matters as much as right-left.
avg+max concat for exam head: max captures "worst slice", avg captures "overall burden"; they're complementary.
z-gap as input feature: lets the GRU compensate for variable slice spacing across studies.
Train both heads jointly: per-slice loss provides dense supervision the per-exam head couldn't learn alone.

References

CNN-GRU Baseline - Stage 2 Train+Inference

wenmin-wu/cv-bigru-slice-feature-aggregator

skills/cv/bigru-slice-feature-aggregator/SKILL.md

Two-stage CT classifier where a 2D CNN dumps per-slice features once, then a bidirectional GRU runs over the slice sequence to produce both per-slice predictions (TimeDistributed head) and an exam-level prediction (avg+max pooled head) — turns expensive 3D CNN training into cheap sequence modeling

24 stars

data-ai

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add wenmin-wu/ds-skills cv-bigru-slice-feature-aggregator

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 9:03 PM1.9s1 file scanned

SKILL.md

name:: cv-bigru-slice-feature-aggregator
description:: Two-stage CT classifier where a 2D CNN dumps per-slice features once, then a bidirectional GRU runs over the slice sequence to produce both per-slice predictions (TimeDistributed head) and an exam-level prediction (avg+max pooled head) — turns expensive 3D CNN training into cheap sequence modeling

Overview

Quick Start

import torch
import torch.nn as nn

class TimeDistributed(nn.Module):
    def __init__(self, layer): super().__init__(); self.layer = layer
    def forward(self, x):  # (B, T, F) -> (B, T, F_out)
        B, T, F = x.shape
        return self.layer(x.reshape(B * T, F)).reshape(B, T, -1)

class SliceGRU(nn.Module):
    def __init__(self, n_feats, hidden=64, n_exam_targets=9):
        super().__init__()
        self.gru = nn.GRU(
            n_feats + 1,         # +1 for inter-slice z-gap
            hidden,
            num_layers=2,
            bidirectional=True,
            batch_first=True,
        )
        self.image_head = TimeDistributed(nn.Linear(hidden * 2, 1))
        self.exam_head  = nn.Linear(hidden * 2 * 2, n_exam_targets)

    def forward(self, slice_feats, z_gaps):
        x = torch.cat([slice_feats, z_gaps.unsqueeze(-1)], dim=2)
        h, _ = self.gru(x)                      # (B, T, 2H)
        per_slice = self.image_head(h)
        avg = h.mean(dim=1)
        mx, _ = h.max(dim=1)
        per_exam = self.exam_head(torch.cat([avg, mx], dim=1))
        return per_slice, per_exam

Workflow

Train a 2D CNN end-to-end on per-slice classification (or load a pretrained backbone)
For every CT series, run the 2D CNN once and dump (num_slices, n_feats) to disk as a single .npy
Compute the inter-slice Z gap from ImagePositionPatient[2] deltas; first slice gets gap = 0
Train the GRU with both losses summed: per-slice BCE (with masking for padding) + per-exam BCE
Use cat(mean_pool, max_pool) for the exam-level head — single-pool is consistently worse
Keep the GRU tiny (hidden=64, 2 layers) — it's a sequence aggregator, not a feature extractor

Key Decisions

Freeze stage 1 before dumping: any backbone update invalidates the feature cache; freeze + dump + train stage 2 is the right order.
Bidirectional, not unidirectional: PE / lesion / nodule context is symmetric; left-right context matters as much as right-left.
avg+max concat for exam head: max captures "worst slice", avg captures "overall burden"; they're complementary.
z-gap as input feature: lets the GRU compensate for variable slice spacing across studies.
Train both heads jointly: per-slice loss provides dense supervision the per-exam head couldn't learn alone.

References

CNN-GRU Baseline - Stage 2 Train+Inference

Related Skills

wenmin-wu/timeseries-scaled-pinball-loss

data-ai

VerifiedTrustedCommunity

Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-scaled-pinball-loss

wenmin-wu/timeseries-retroactive-outlier-rescaling

data-ai

VerifiedTrustedCommunity

Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-retroactive-outlier-rescaling

wenmin-wu/timeseries-ratio-target-for-smape

testing

VerifiedTrustedCommunity

Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-ratio-target-for-smape

wenmin-wu/timeseries-quantile-ratio-scaling

tools

VerifiedTrustedCommunity

Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-quantile-ratio-scaling

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/wenmin-wu/ds-skills.git

# Copy into Claude Code skills folder (global)
cp -r ds-skills/skills/cv/bigru-slice-feature-aggregator ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

wenmin-wu/ds-skills

24 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT