skills/llm/two-pass-retrieval-refinement/SKILL.md
Refines retrieval by running two passes: initial embedding retrieval to get candidates, then LLM-generated text concatenated with the query for a second retrieval pass.
npx skillsauth add wenmin-wu/ds-skills llm-two-pass-retrieval-refinementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Single-pass dense retrieval can miss when query and target use different vocabulary. Fix this with two passes: (1) retrieve top-K candidates using an initial embedding, (2) feed candidates into an LLM to generate a refined description, (3) concatenate the LLM output with the original query and re-embed for a second retrieval pass. The LLM bridges the vocabulary gap between query and target.
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("all-MiniLM-L6-v2")
# Pass 1: initial retrieval with sparse query features
q_embeds = model.encode(df["ConstructName"] + ". " + df["SubjectName"])
doc_embeds = model.encode(documents)
top100 = util.semantic_search(q_embeds, doc_embeds, top_k=100)
# Feed top candidates to LLM for refined description
llm_output = llm.generate(build_prompts(df, top100)) # LLM describes best match
# Pass 2: re-retrieve with enriched query
enriched = [f"{llm_out}\n\n{orig}" for llm_out, orig in zip(llm_output, df["full_text"])]
q_embeds_v2 = model.encode(enriched)
top25 = util.semantic_search(q_embeds_v2, doc_embeds, top_k=25)
data-ai
Scaled Pinball Loss (SPL) metric for evaluating quantile forecasts, normalized by mean absolute successive differences of training data
data-ai
Walk backward through a time series and multiplicatively rescale segments when jumps exceed a fraction of the running mean to correct data collection anomalies
testing
Transform forecasting target to next/current ratio minus one so that optimizing MAE or squared error implicitly minimizes SMAPE
tools
Convert point forecasts to prediction intervals by scaling with logit-transformed quantile ratios passed through a Normal CDF