skills/nlp/joint-span-answer-type-head/SKILL.md
Dual-head transformer model that jointly predicts start/end span logits from sequence output and answer type from pooled CLS output.
npx skillsauth add wenmin-wu/ds-skills nlp-joint-span-answer-type-headInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Open-domain QA requires both extracting an answer span and classifying the answer type (short answer, long answer, yes/no, or unanswerable). A joint model shares the transformer backbone and adds two heads: (1) a token-level Dense(2) split into start/end logits for span extraction, and (2) a Dense(N) on the CLS pooled output for answer type classification. Joint training improves both tasks through shared representations.
import torch
import torch.nn as nn
from transformers import AutoModel
class JointQAModel(nn.Module):
def __init__(self, model_name, n_answer_types=5):
super().__init__()
self.backbone = AutoModel.from_pretrained(model_name)
hidden = self.backbone.config.hidden_size
self.span_head = nn.Linear(hidden, 2) # start + end logits
self.type_head = nn.Linear(hidden, n_answer_types)
def forward(self, input_ids, attention_mask):
outputs = self.backbone(input_ids, attention_mask=attention_mask)
sequence_output = outputs.last_hidden_state # (B, seq_len, H)
pooled_output = outputs.last_hidden_state[:, 0] # CLS token
span_logits = self.span_head(sequence_output) # (B, seq_len, 2)
start_logits, end_logits = span_logits.split(1, dim=-1)
start_logits = start_logits.squeeze(-1) # (B, seq_len)
end_logits = end_logits.squeeze(-1)
type_logits = self.type_head(pooled_output) # (B, n_types)
return start_logits, end_logits, type_logits
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF