skills/rag-architect/hybrid-retrieval/SKILL.md
Implement hybrid search combining vector and keyword retrieval for RAG systems. Use this skill when building RAG retrieval, combining semantic search with BM25, implementing reciprocal rank fusion (RRF), or optimizing retrieval accuracy. Activate when: vector search, keyword search, BM25, semantic search, hybrid RAG, retrieval optimization, search relevance, reranking.
npx skillsauth add latestaiagents/agent-skills hybrid-retrievalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Combine vector similarity with keyword matching for superior retrieval accuracy.
Vector search alone misses:
Keyword search alone misses:
Hybrid search combines both for 15-25% better recall.
The standard for combining ranked results from multiple retrievers:
def reciprocal_rank_fusion(
results_lists: list[list[dict]],
k: int = 60
) -> list[dict]:
"""
Combine multiple ranked result lists using RRF.
Args:
results_lists: List of ranked results from different retrievers
k: Ranking constant (default 60, higher = more weight to lower ranks)
Returns:
Fused and re-ranked results
"""
fused_scores = {}
for results in results_lists:
for rank, doc in enumerate(results):
doc_id = doc["id"]
if doc_id not in fused_scores:
fused_scores[doc_id] = {"doc": doc, "score": 0}
# RRF formula: 1 / (k + rank)
fused_scores[doc_id]["score"] += 1 / (k + rank + 1)
# Sort by fused score
sorted_results = sorted(
fused_scores.values(),
key=lambda x: x["score"],
reverse=True
)
return [item["doc"] for item in sorted_results]
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma
# Create vector retriever
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
# Create BM25 retriever
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 10
# Combine with weights
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.4, 0.6] # Tune based on your data
)
# Use in RAG chain
results = ensemble_retriever.invoke("your query here")
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.retrievers.bm25 import BM25Retriever
# Build index
index = VectorStoreIndex.from_documents(documents)
# Create retrievers
vector_retriever = index.as_retriever(similarity_top_k=10)
bm25_retriever = BM25Retriever.from_defaults(
nodes=index.docstore.docs.values(),
similarity_top_k=10
)
# Fusion retriever with query expansion
retriever = QueryFusionRetriever(
retrievers=[vector_retriever, bm25_retriever],
similarity_top_k=10,
num_queries=4, # Generate 4 query variations
mode="reciprocal_rerank",
use_async=True,
)
from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector
client = QdrantClient(url="http://localhost:6333")
# Hybrid search with both dense and sparse vectors
results = client.query_points(
collection_name="documents",
prefetch=[
# Dense vector search
models.Prefetch(
query=dense_embedding, # [0.1, 0.2, ...]
using="dense",
limit=20
),
# Sparse vector search (BM25-style)
models.Prefetch(
query=SparseVector(
indices=[1, 42, 123], # Token IDs
values=[0.5, 0.8, 0.3] # Token weights
),
using="sparse",
limit=20
),
],
query=models.FusionQuery(fusion=models.Fusion.RRF),
limit=10
)
After hybrid retrieval, rerank for final ordering:
from sentence_transformers import CrossEncoder
# Load reranker model
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
def rerank_results(query: str, documents: list[str], top_k: int = 5):
"""Rerank documents using cross-encoder."""
# Create query-document pairs
pairs = [[query, doc] for doc in documents]
# Score all pairs
scores = reranker.predict(pairs)
# Sort by score
scored_docs = list(zip(documents, scores))
scored_docs.sort(key=lambda x: x[1], reverse=True)
return [doc for doc, score in scored_docs[:top_k]]
import cohere
co = cohere.Client("your-api-key")
def cohere_rerank(query: str, documents: list[str], top_k: int = 5):
"""Rerank using Cohere's rerank endpoint."""
response = co.rerank(
model="rerank-english-v3.0",
query=query,
documents=documents,
top_n=top_k,
return_documents=True
)
return [result.document.text for result in response.results]
| Data Type | Vector Weight | Keyword Weight | |-----------|---------------|----------------| | Technical docs | 0.5 | 0.5 | | Legal/compliance | 0.4 | 0.6 | | Creative content | 0.7 | 0.3 | | Product catalogs | 0.3 | 0.7 | | Code repositories | 0.4 | 0.6 |
Is the query an exact match (ID, code, name)?
├─ Yes → Keyword-heavy (0.3 vector / 0.7 keyword)
└─ No → Is it conceptual/semantic?
├─ Yes → Vector-heavy (0.7 vector / 0.3 keyword)
└─ Mixed → Balanced (0.5 / 0.5) + reranking
development
Test skills for correct activation, content quality, and regression — both automated checks (frontmatter validity, lint) and manual verification (query-suite activation testing). Covers CI integration and how to catch skill regressions before users do. Use this skill when adding skills to a repo, setting up CI for a skill library, or debugging "the skill exists but doesn't work". Activate when: test skills, validate skills, skill CI, skill linting, skill activation test, skill regression.
documentation
Write the YAML frontmatter for a SKILL.md file so it activates reliably — name, description, and activation keywords that the model matches against. Covers length, tone, and the most common frontmatter mistakes. Use this skill when authoring a new skill, fixing a skill that isn't auto-activating, or reviewing skills for publication. Activate when: SKILL.md frontmatter, skill description, skill activation, skill YAML, write a skill, author a skill.
development
Design skills that fire at the right moment — neither over-eager (noise) nor under-eager (silent). Covers activation specificity, trigger phrases, disambiguation between overlapping skills, and debugging activation. Use this skill when multiple skills could fire on the same query, a skill never fires, or a skill fires too often. Activate when: skill won't activate, skill over-activates, overlapping skills, skill triggers, skill selection, skill disambiguation.
development
Structure SKILL.md content so the model reads just enough — concise summary up front, progressively deeper detail, examples on demand. Covers section ordering, length budgets, when to split into multiple skills. Use this skill when writing or refactoring a skill body, one skill has grown too long, or a skill is wordy but not useful. Activate when: SKILL.md structure, skill content, skill too long, split skill, progressive disclosure, skill body.