skills/nlp/multi-retriever-union-ensemble/SKILL.md
Run multiple independent retrieve-rerank pipelines and union-merge their predicted IDs per query via explode-groupby-unique
npx skillsauth add wenmin-wu/ds-skills nlp-multi-retriever-union-ensembleInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Different retrievers (varying model architectures, embedding sizes, training data) recall different relevant items. Instead of averaging scores, run each pipeline end-to-end independently, then union all predicted IDs per query. This maximizes recall without complex score calibration across models.
import pandas as pd
configs = [cfg_roberta, cfg_minilm, cfg_mpnet]
submissions = []
for i, cfg in enumerate(configs):
emb_q, emb_d = encode(queries, docs, cfg.model)
candidates = knn_retrieve(emb_q, emb_d, top_n=cfg.top_n)
pairs = build_pairs(queries, docs, candidates)
scores = rerank(pairs, cfg.cross_encoder)
sub = threshold_and_group(scores, cfg.threshold)
sub.to_csv(f'sub_{i}.csv', index=False)
submissions.append(sub)
# Union ensemble
merged = pd.concat(submissions)
merged['content_ids'] = merged['content_ids'].str.split(' ')
merged = (merged.explode('content_ids')
.groupby('topic_id')['content_ids']
.unique().reset_index())
merged['content_ids'] = merged['content_ids'].apply(' '.join)
groupby(query_id).unique() to deduplicatedata-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF