Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

wenmin-wu/cv-per-organ-multihead-sigmoid-softmax

Name: cv-per-organ-multihead-sigmoid-softmax
Author: wenmin-wu

skills/cv/per-organ-multihead-sigmoid-softmax/SKILL.md

npx skillsauth add wenmin-wu/ds-skills cv-per-organ-multihead-sigmoid-softmax

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Overview

Multi-organ trauma classification has a heterogeneous label structure: some organs are binary (injured / not), others have ordered severity grades (healthy / low / high). The naive answer — one big sigmoid head with all classes flattened — destroys the mutual exclusivity inside each grade group and trains every label to compete with every other label. The right structure is one shared backbone, a tiny per-organ "neck" Dense layer, and a head whose activation matches the label semantics: sigmoid for binary organs, softmax for severity-graded ones. Keras compile(loss={...}) accepts a dict mapping head names to losses, so each head gets its correct loss without hand-rolling.

Quick Start

from tensorflow.keras.layers import GlobalAveragePooling2D, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy

x = GlobalAveragePooling2D()(backbone.output)

necks = {n: Dense(32, activation='silu', name=f'{n}_neck')(x)
         for n in ['bowel', 'extra', 'liver', 'kidney', 'spleen']}

outs = [
    Dense(1, activation='sigmoid', name='bowel')(necks['bowel']),
    Dense(1, activation='sigmoid', name='extra')(necks['extra']),
    Dense(3, activation='softmax', name='liver')(necks['liver']),
    Dense(3, activation='softmax', name='kidney')(necks['kidney']),
    Dense(3, activation='softmax', name='spleen')(necks['spleen']),
]

model = Model(backbone.inputs, outs)
model.compile(
    optimizer='adam',
    loss={
        'bowel':  BinaryCrossentropy(label_smoothing=0.05),
        'extra':  BinaryCrossentropy(label_smoothing=0.05),
        'liver':  CategoricalCrossentropy(label_smoothing=0.05),
        'kidney': CategoricalCrossentropy(label_smoothing=0.05),
        'spleen': CategoricalCrossentropy(label_smoothing=0.05),
    },
)

Workflow

Pick the smallest "neck" width that still trains (32 is usually plenty) — smaller necks force the backbone to do the work
Use sigmoid heads for binary organs and softmax heads for ordered/multi-class organs in the same model
Pass a dict to compile(loss=...) matching the head names — Keras auto-routes per-output losses
Apply uniform label_smoothing=0.05 across all heads to prevent any one organ from collapsing onto a 0/1 saturated prediction
At inference, concatenate head outputs in the order the submission expects (use cv-multihead-softmax-to-flat-submission patterns)

Key Decisions

Per-organ neck, not shared head: a shared head forces every organ to use the same projection of backbone features and underperforms on rare classes.
Sigmoid + softmax in the same model: forcing everything to sigmoid breaks softmax's mutual-exclusivity guarantee for severity grades; forcing to softmax breaks the binary semantics.
silu over relu in the neck: smoother gradients on a tiny 32-unit layer; the difference is small but consistently positive.
Label smoothing = 0.05, not 0.1: medical labels are clean enough that aggressive smoothing hurts; 0.05 is the sweet spot for log-loss metrics.
Don't share weights between necks: the whole point is per-organ specialization on a shared visual representation.

References

RSNA-ATD: CNN [TPU][Train]

wenmin-wu/cv-per-organ-multihead-sigmoid-softmax

skills/cv/per-organ-multihead-sigmoid-softmax/SKILL.md

Single CNN backbone with one shallow Dense neck per organ and mixed sigmoid (binary) + softmax (multi-class severity) heads, trained with a dict of losses so each organ is calibrated independently while sharing visual features

24 stars

data-ai

Updated Apr 18, 2026

$ install --global

skillsauth

npx skillsauth add wenmin-wu/ds-skills cv-per-organ-multihead-sigmoid-softmax

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 18, 2026, 2:53 AM65.9s1 file scanned

SKILL.md

name:: cv-per-organ-multihead-sigmoid-softmax
description:: Single CNN backbone with one shallow Dense neck per organ and mixed sigmoid (binary) + softmax (multi-class severity) heads, trained with a dict of losses so each organ is calibrated independently while sharing visual features

Overview

Quick Start

from tensorflow.keras.layers import GlobalAveragePooling2D, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.losses import BinaryCrossentropy, CategoricalCrossentropy

x = GlobalAveragePooling2D()(backbone.output)

necks = {n: Dense(32, activation='silu', name=f'{n}_neck')(x)
         for n in ['bowel', 'extra', 'liver', 'kidney', 'spleen']}

outs = [
    Dense(1, activation='sigmoid', name='bowel')(necks['bowel']),
    Dense(1, activation='sigmoid', name='extra')(necks['extra']),
    Dense(3, activation='softmax', name='liver')(necks['liver']),
    Dense(3, activation='softmax', name='kidney')(necks['kidney']),
    Dense(3, activation='softmax', name='spleen')(necks['spleen']),
]

model = Model(backbone.inputs, outs)
model.compile(
    optimizer='adam',
    loss={
        'bowel':  BinaryCrossentropy(label_smoothing=0.05),
        'extra':  BinaryCrossentropy(label_smoothing=0.05),
        'liver':  CategoricalCrossentropy(label_smoothing=0.05),
        'kidney': CategoricalCrossentropy(label_smoothing=0.05),
        'spleen': CategoricalCrossentropy(label_smoothing=0.05),
    },
)

Workflow

Pick the smallest "neck" width that still trains (32 is usually plenty) — smaller necks force the backbone to do the work
Use sigmoid heads for binary organs and softmax heads for ordered/multi-class organs in the same model
Pass a dict to compile(loss=...) matching the head names — Keras auto-routes per-output losses
Apply uniform label_smoothing=0.05 across all heads to prevent any one organ from collapsing onto a 0/1 saturated prediction
At inference, concatenate head outputs in the order the submission expects (use cv-multihead-softmax-to-flat-submission patterns)

Key Decisions

Per-organ neck, not shared head: a shared head forces every organ to use the same projection of backbone features and underperforms on rare classes.
Sigmoid + softmax in the same model: forcing everything to sigmoid breaks softmax's mutual-exclusivity guarantee for severity grades; forcing to softmax breaks the binary semantics.
silu over relu in the neck: smoother gradients on a tiny 32-unit layer; the difference is small but consistently positive.
Label smoothing = 0.05, not 0.1: medical labels are clean enough that aggressive smoothing hurts; 0.05 is the sweet spot for log-loss metrics.
Don't share weights between necks: the whole point is per-organ specialization on a shared visual representation.

References

RSNA-ATD: CNN [TPU][Train]

Related Skills

wenmin-wu/timeseries-scaled-pinball-loss

data-ai

VerifiedTrustedCommunity

Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-scaled-pinball-loss

wenmin-wu/timeseries-retroactive-outlier-rescaling

data-ai

VerifiedTrustedCommunity

Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-retroactive-outlier-rescaling

wenmin-wu/timeseries-ratio-target-for-smape

testing

VerifiedTrustedCommunity

Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-ratio-target-for-smape

wenmin-wu/timeseries-quantile-ratio-scaling

tools

VerifiedTrustedCommunity

Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF

31SKILL.mdUpdated Apr 23, 2026

wenmin-wu/timeseries-quantile-ratio-scaling

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/wenmin-wu/ds-skills.git

# Copy into Claude Code skills folder (global)
cp -r ds-skills/skills/cv/per-organ-multihead-sigmoid-softmax ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

wenmin-wu/ds-skills

24 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT