Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

latestaiagents/hybrid-retrieval

Name: hybrid-retrieval
Author: latestaiagents

skills/rag-architect/hybrid-retrieval/SKILL.md

npx skillsauth add latestaiagents/agent-skills hybrid-retrieval

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Hybrid Retrieval for RAG

Combine vector similarity with keyword matching for superior retrieval accuracy.

Why Hybrid is Mandatory in 2026

Vector search alone misses:

Exact matches (product codes, IDs, names)
Rare terms not well-represented in embeddings
Keyword-specific queries ("error code E-5012")

Keyword search alone misses:

Semantic similarity ("car" vs "automobile")
Context and meaning
Paraphrased content

Hybrid search combines both for 15-25% better recall.

Core Patterns

Pattern 1: Reciprocal Rank Fusion (RRF)

The standard for combining ranked results from multiple retrievers:

def reciprocal_rank_fusion(
    results_lists: list[list[dict]],
    k: int = 60
) -> list[dict]:
    """
    Combine multiple ranked result lists using RRF.

    Args:
        results_lists: List of ranked results from different retrievers
        k: Ranking constant (default 60, higher = more weight to lower ranks)

    Returns:
        Fused and re-ranked results
    """
    fused_scores = {}

    for results in results_lists:
        for rank, doc in enumerate(results):
            doc_id = doc["id"]
            if doc_id not in fused_scores:
                fused_scores[doc_id] = {"doc": doc, "score": 0}
            # RRF formula: 1 / (k + rank)
            fused_scores[doc_id]["score"] += 1 / (k + rank + 1)

    # Sort by fused score
    sorted_results = sorted(
        fused_scores.values(),
        key=lambda x: x["score"],
        reverse=True
    )
    return [item["doc"] for item in sorted_results]

Pattern 2: LangChain Ensemble Retriever

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma

# Create vector retriever
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

# Create BM25 retriever
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 10

# Combine with weights
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.4, 0.6]  # Tune based on your data
)

# Use in RAG chain
results = ensemble_retriever.invoke("your query here")

Pattern 3: LlamaIndex Hybrid Search

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.retrievers.bm25 import BM25Retriever

# Build index
index = VectorStoreIndex.from_documents(documents)

# Create retrievers
vector_retriever = index.as_retriever(similarity_top_k=10)
bm25_retriever = BM25Retriever.from_defaults(
    nodes=index.docstore.docs.values(),
    similarity_top_k=10
)

# Fusion retriever with query expansion
retriever = QueryFusionRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    similarity_top_k=10,
    num_queries=4,  # Generate 4 query variations
    mode="reciprocal_rerank",
    use_async=True,
)

Pattern 4: Direct Implementation with Qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector

client = QdrantClient(url="http://localhost:6333")

# Hybrid search with both dense and sparse vectors
results = client.query_points(
    collection_name="documents",
    prefetch=[
        # Dense vector search
        models.Prefetch(
            query=dense_embedding,  # [0.1, 0.2, ...]
            using="dense",
            limit=20
        ),
        # Sparse vector search (BM25-style)
        models.Prefetch(
            query=SparseVector(
                indices=[1, 42, 123],  # Token IDs
                values=[0.5, 0.8, 0.3]  # Token weights
            ),
            using="sparse",
            limit=20
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=10
)

Reranking for Better Precision

After hybrid retrieval, rerank for final ordering:

Cross-Encoder Reranking

from sentence_transformers import CrossEncoder

# Load reranker model
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank_results(query: str, documents: list[str], top_k: int = 5):
    """Rerank documents using cross-encoder."""
    # Create query-document pairs
    pairs = [[query, doc] for doc in documents]

    # Score all pairs
    scores = reranker.predict(pairs)

    # Sort by score
    scored_docs = list(zip(documents, scores))
    scored_docs.sort(key=lambda x: x[1], reverse=True)

    return [doc for doc, score in scored_docs[:top_k]]

Cohere Rerank API

import cohere

co = cohere.Client("your-api-key")

def cohere_rerank(query: str, documents: list[str], top_k: int = 5):
    """Rerank using Cohere's rerank endpoint."""
    response = co.rerank(
        model="rerank-english-v3.0",
        query=query,
        documents=documents,
        top_n=top_k,
        return_documents=True
    )
    return [result.document.text for result in response.results]

Weight Tuning Guidelines

| Data Type | Vector Weight | Keyword Weight | |-----------|---------------|----------------| | Technical docs | 0.5 | 0.5 | | Legal/compliance | 0.4 | 0.6 | | Creative content | 0.7 | 0.3 | | Product catalogs | 0.3 | 0.7 | | Code repositories | 0.4 | 0.6 |

Best Practices

Always benchmark - Test vector-only, keyword-only, and hybrid on your data
Tune weights empirically - Start at 0.5/0.5, adjust based on evaluation
Use reranking - Hybrid retrieval + reranking = best results
Consider query type - Route exact-match queries to keyword, semantic to vector
Monitor latency - Hybrid adds overhead; cache where possible

Quick Decision Tree

Is the query an exact match (ID, code, name)?
├─ Yes → Keyword-heavy (0.3 vector / 0.7 keyword)
└─ No → Is it conceptual/semantic?
         ├─ Yes → Vector-heavy (0.7 vector / 0.3 keyword)
         └─ Mixed → Balanced (0.5 / 0.5) + reranking

latestaiagents/hybrid-retrieval

skills/rag-architect/hybrid-retrieval/SKILL.md

Implement hybrid search combining vector and keyword retrieval for RAG systems. Use this skill when building RAG retrieval, combining semantic search with BM25, implementing reciprocal rank fusion (RRF), or optimizing retrieval accuracy. Activate when: vector search, keyword search, BM25, semantic search, hybrid RAG, retrieval optimization, search relevance, reranking.

2 stars

development

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add latestaiagents/agent-skills hybrid-retrieval

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 2:56 AM9.1s1 file scanned

SKILL.md

name:: hybrid-retrieval
description:: |
Activate when:: vector search, keyword search, BM25, semantic search, hybrid RAG,

Hybrid Retrieval for RAG

Combine vector similarity with keyword matching for superior retrieval accuracy.

Why Hybrid is Mandatory in 2026

Vector search alone misses:

Exact matches (product codes, IDs, names)
Rare terms not well-represented in embeddings
Keyword-specific queries ("error code E-5012")

Keyword search alone misses:

Semantic similarity ("car" vs "automobile")
Context and meaning
Paraphrased content

Hybrid search combines both for 15-25% better recall.

Core Patterns

Pattern 1: Reciprocal Rank Fusion (RRF)

The standard for combining ranked results from multiple retrievers:

def reciprocal_rank_fusion(
    results_lists: list[list[dict]],
    k: int = 60
) -> list[dict]:
    """
    Combine multiple ranked result lists using RRF.

    Args:
        results_lists: List of ranked results from different retrievers
        k: Ranking constant (default 60, higher = more weight to lower ranks)

    Returns:
        Fused and re-ranked results
    """
    fused_scores = {}

    for results in results_lists:
        for rank, doc in enumerate(results):
            doc_id = doc["id"]
            if doc_id not in fused_scores:
                fused_scores[doc_id] = {"doc": doc, "score": 0}
            # RRF formula: 1 / (k + rank)
            fused_scores[doc_id]["score"] += 1 / (k + rank + 1)

    # Sort by fused score
    sorted_results = sorted(
        fused_scores.values(),
        key=lambda x: x["score"],
        reverse=True
    )
    return [item["doc"] for item in sorted_results]

Pattern 2: LangChain Ensemble Retriever

from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma

# Create vector retriever
vectorstore = Chroma.from_documents(documents, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

# Create BM25 retriever
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 10

# Combine with weights
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.4, 0.6]  # Tune based on your data
)

# Use in RAG chain
results = ensemble_retriever.invoke("your query here")

Pattern 3: LlamaIndex Hybrid Search

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.retrievers.bm25 import BM25Retriever

# Build index
index = VectorStoreIndex.from_documents(documents)

# Create retrievers
vector_retriever = index.as_retriever(similarity_top_k=10)
bm25_retriever = BM25Retriever.from_defaults(
    nodes=index.docstore.docs.values(),
    similarity_top_k=10
)

# Fusion retriever with query expansion
retriever = QueryFusionRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    similarity_top_k=10,
    num_queries=4,  # Generate 4 query variations
    mode="reciprocal_rerank",
    use_async=True,
)

Pattern 4: Direct Implementation with Qdrant

from qdrant_client import QdrantClient
from qdrant_client.models import SparseVector, NamedSparseVector, NamedVector

client = QdrantClient(url="http://localhost:6333")

# Hybrid search with both dense and sparse vectors
results = client.query_points(
    collection_name="documents",
    prefetch=[
        # Dense vector search
        models.Prefetch(
            query=dense_embedding,  # [0.1, 0.2, ...]
            using="dense",
            limit=20
        ),
        # Sparse vector search (BM25-style)
        models.Prefetch(
            query=SparseVector(
                indices=[1, 42, 123],  # Token IDs
                values=[0.5, 0.8, 0.3]  # Token weights
            ),
            using="sparse",
            limit=20
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=10
)

Reranking for Better Precision

After hybrid retrieval, rerank for final ordering:

Cross-Encoder Reranking

from sentence_transformers import CrossEncoder

# Load reranker model
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank_results(query: str, documents: list[str], top_k: int = 5):
    """Rerank documents using cross-encoder."""
    # Create query-document pairs
    pairs = [[query, doc] for doc in documents]

    # Score all pairs
    scores = reranker.predict(pairs)

    # Sort by score
    scored_docs = list(zip(documents, scores))
    scored_docs.sort(key=lambda x: x[1], reverse=True)

    return [doc for doc, score in scored_docs[:top_k]]

Cohere Rerank API

import cohere

co = cohere.Client("your-api-key")

def cohere_rerank(query: str, documents: list[str], top_k: int = 5):
    """Rerank using Cohere's rerank endpoint."""
    response = co.rerank(
        model="rerank-english-v3.0",
        query=query,
        documents=documents,
        top_n=top_k,
        return_documents=True
    )
    return [result.document.text for result in response.results]

Weight Tuning Guidelines

Best Practices

Always benchmark - Test vector-only, keyword-only, and hybrid on your data
Tune weights empirically - Start at 0.5/0.5, adjust based on evaluation
Use reranking - Hybrid retrieval + reranking = best results
Consider query type - Route exact-match queries to keyword, semantic to vector
Monitor latency - Hybrid adds overhead; cache where possible

Quick Decision Tree

Is the query an exact match (ID, code, name)?
├─ Yes → Keyword-heavy (0.3 vector / 0.7 keyword)
└─ No → Is it conceptual/semantic?
         ├─ Yes → Vector-heavy (0.7 vector / 0.3 keyword)
         └─ Mixed → Balanced (0.5 / 0.5) + reranking

Related Skills

latestaiagents/skill-testing

development

VerifiedTrustedCommunity

Test skills for correct activation, content quality, and regression — both automated checks (frontmatter validity, lint) and manual verification (query-suite activation testing). Covers CI integration and how to catch skill regressions before users do. Use this skill when adding skills to a repo, setting up CI for a skill library, or debugging "the skill exists but doesn't work". Activate when: test skills, validate skills, skill CI, skill linting, skill activation test, skill regression.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-testing

latestaiagents/skill-frontmatter

documentation

VerifiedTrustedCommunity

Write the YAML frontmatter for a SKILL.md file so it activates reliably — name, description, and activation keywords that the model matches against. Covers length, tone, and the most common frontmatter mistakes. Use this skill when authoring a new skill, fixing a skill that isn't auto-activating, or reviewing skills for publication. Activate when: SKILL.md frontmatter, skill description, skill activation, skill YAML, write a skill, author a skill.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-frontmatter

latestaiagents/skill-activation-patterns

development

VerifiedTrustedCommunity

Design skills that fire at the right moment — neither over-eager (noise) nor under-eager (silent). Covers activation specificity, trigger phrases, disambiguation between overlapping skills, and debugging activation. Use this skill when multiple skills could fire on the same query, a skill never fires, or a skill fires too often. Activate when: skill won't activate, skill over-activates, overlapping skills, skill triggers, skill selection, skill disambiguation.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-activation-patterns

latestaiagents/progressive-disclosure

development

VerifiedTrustedCommunity

Structure SKILL.md content so the model reads just enough — concise summary up front, progressively deeper detail, examples on demand. Covers section ordering, length budgets, when to split into multiple skills. Use this skill when writing or refactoring a skill body, one skill has grown too long, or a skill is wordy but not useful. Activate when: SKILL.md structure, skill content, skill too long, split skill, progressive disclosure, skill body.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/progressive-disclosure

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/latestaiagents/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/rag-architect/hybrid-retrieval ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

latestaiagents/agent-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT