skills/dspy-qdrant/SKILL.md
Use Qdrant as a vector database with DSPy, or connect any vector DB (Pinecone, ChromaDB, Weaviate) with custom retrievers. Use when you want to set up Qdrant, QdrantRM, dspy-qdrant, vector database for DSPy, vector search, hybrid search, or build custom retrievers for Pinecone, ChromaDB, or Weaviate. Also used for qdrant, dspy-qdrant, QdrantRM, vector database, vector search, pinecone DSPy, chromadb DSPy, weaviate DSPy, vector DB for DSPy, pip install dspy-qdrant, qdrant docker, qdrant cloud, hybrid search DSPy, sparse dense vectors, custom dspy.Retrieve, which vector DB for DSPy, DSPy 3.0 retriever removed.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills dspy-qdrantInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through setting up Qdrant with DSPy using the official dspy-qdrant package, plus custom retriever patterns for Pinecone, ChromaDB, and Weaviate.
Qdrant is an open-source vector search engine written in Rust. It's the only vector database with an official DSPy integration package (dspy-qdrant). Features: hybrid search (dense + sparse), payload filtering, multi-tenancy, and horizontal scaling.
DSPy 3.0 removed all community-contributed retriever modules (ChromadbRM, PineconeRM, WeaviateRM, QdrantRM from the main repo). The dspy-qdrant package is the official replacement — maintained separately with full DSPy compatibility.
For other vector databases, you write a short custom dspy.Retrieve subclass (~15 lines). This skill covers that pattern too.
pip install dspy-qdrant
This installs both the Qdrant client and the DSPy retriever module.
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
export QDRANT_URL="https://your-cluster.aws.cloud.qdrant.io"
export QDRANT_API_KEY="your-api-key"
from qdrant_client import QdrantClient
client = QdrantClient(":memory:") # no server needed
import dspy
from qdrant_client import QdrantClient
from dspy_qdrant import QdrantRM
client = QdrantClient("http://localhost:6333")
retriever = QdrantRM(
qdrant_collection_name="my_docs",
qdrant_client=client,
k=5,
document_field="document", # payload field containing document text (default)
)
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"), rm=retriever) # or "anthropic/claude-sonnet-4-5-20250929", etc.
# Now dspy.Retrieve() uses Qdrant
search = dspy.Retrieve(k=5)
result = search("How do refunds work?")
print(result.passages)
QdrantRM(
qdrant_collection_name: str, # required — collection name in Qdrant
qdrant_client: QdrantClient, # required — initialized client instance
k: int = 3, # top passages to retrieve
document_field: str = "document", # payload field with document text
vectorizer=None, # BaseSentenceVectorizer (default: FastEmbedVectorizer)
vector_name: str = None, # named vector to search (default: first available)
)
By default, QdrantRM uses FastEmbed (BAAI/bge-small-en-v1.5) for query vectorization. To use a different embedder, pass a custom vectorizer.
import os
from qdrant_client import QdrantClient
from dspy_qdrant import QdrantRM
client = QdrantClient(
url=os.environ["QDRANT_URL"],
api_key=os.environ["QDRANT_API_KEY"],
)
retriever = QdrantRM(
qdrant_collection_name="my_docs",
qdrant_client=client,
k=5,
)
Before you can search, you need to populate your Qdrant collection:
from qdrant_client import QdrantClient, models
import dspy
client = QdrantClient("http://localhost:6333")
embedder = dspy.Embedder("openai/text-embedding-3-small", dimensions=512)
# Your documents
docs = [
{"id": 1, "document": "Refunds are processed within 5-7 business days.", "category": "billing"},
{"id": 2, "document": "Reset your password at Settings > Security.", "category": "account"},
{"id": 3, "document": "Enterprise plans include SSO and dedicated support.", "category": "plans"},
]
# Create collection
client.create_collection(
collection_name="my_docs",
vectors_config=models.VectorParams(size=512, distance=models.Distance.COSINE),
)
# Upsert with embeddings
vectors = embedder([d["document"] for d in docs])
client.upsert(
collection_name="my_docs",
points=[
models.PointStruct(
id=d["id"],
vector=v,
payload={"document": d["document"], "category": d["category"]},
)
for d, v in zip(docs, vectors)
],
)
import dspy
from qdrant_client import QdrantClient
from dspy_qdrant import QdrantRM
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # or "anthropic/claude-sonnet-4-5-20250929", etc.
retriever = QdrantRM(
qdrant_collection_name="my_docs",
qdrant_client=QdrantClient("http://localhost:6333"),
k=5,
)
class RAG(dspy.Module):
def __init__(self):
self.retrieve = retriever
self.answer = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.answer(context=context, question=question)
rag = RAG()
result = rag(question="How do refunds work?")
print(result.answer)
Qdrant supports hybrid search combining dense (semantic) and sparse (keyword) vectors in the same collection. This improves recall for queries that need both semantic understanding and exact keyword matching.
from qdrant_client import QdrantClient, models
client = QdrantClient("http://localhost:6333")
# Create collection with both dense and sparse vectors
client.create_collection(
collection_name="hybrid_docs",
vectors_config=models.VectorParams(size=512, distance=models.Distance.COSINE),
sparse_vectors_config={
"keywords": models.SparseVectorParams(
modifier=models.Modifier.IDF,
),
},
)
Then query with both:
results = client.query_points(
collection_name="hybrid_docs",
prefetch=[
models.Prefetch(query=dense_vector, using="", limit=20),
models.Prefetch(query=sparse_vector, using="keywords", limit=20),
],
query=models.FusionQuery(fusion=models.Fusion.RRF), # reciprocal rank fusion
limit=5,
)
Since DSPy 3.0 removed built-in community retrievers, use a custom dspy.Retrieve subclass for any vector database. The pattern is always the same:
class MyVectorDBRetriever(dspy.Retrieve):
def __init__(self, client, collection, k=3):
super().__init__(k=k)
self.client = client
self.collection = collection
def forward(self, query, k=None):
k = k or self.k
results = self.client.search(self.collection, query, top_k=k)
return dspy.Prediction(passages=[r["text"] for r in results])
from pinecone import Pinecone
import dspy
class PineconeRetriever(dspy.Retrieve):
def __init__(self, index_name, embedder, k=3):
super().__init__(k=k)
pc = Pinecone() # reads PINECONE_API_KEY from env
self.index = pc.Index(index_name)
self.embedder = embedder
def forward(self, query, k=None):
k = k or self.k
vector = self.embedder(query)
results = self.index.query(vector=vector, top_k=k, include_metadata=True)
passages = [m["metadata"]["text"] for m in results["matches"]]
return dspy.Prediction(passages=passages)
# Usage
embedder = dspy.Embedder("openai/text-embedding-3-small", dimensions=512)
retriever = PineconeRetriever("my-index", embedder, k=5)
import chromadb
import dspy
class ChromaRetriever(dspy.Retrieve):
def __init__(self, collection_name, k=3):
super().__init__(k=k)
client = chromadb.PersistentClient(path="./chroma_db")
self.collection = client.get_or_create_collection(collection_name)
def forward(self, query, k=None):
k = k or self.k
results = self.collection.query(query_texts=[query], n_results=k)
return dspy.Prediction(passages=results["documents"][0])
# Usage
retriever = ChromaRetriever("my_docs", k=5)
import weaviate
import dspy
class WeaviateRetriever(dspy.Retrieve):
def __init__(self, class_name, url="http://localhost:8080", k=3):
super().__init__(k=k)
self.client = weaviate.connect_to_local(host=url.replace("http://", "").split(":")[0])
self.collection = self.client.collections.get(class_name)
def forward(self, query, k=None):
k = k or self.k
results = self.collection.query.near_text(query=query, limit=k)
passages = [obj.properties["text"] for obj in results.objects]
return dspy.Prediction(passages=passages)
# Usage
retriever = WeaviateRetriever("MyDocs", k=5)
| Feature | Qdrant | Pinecone | ChromaDB | Weaviate |
|---------|--------|----------|----------|----------|
| DSPy package | dspy-qdrant (official) | None (custom retriever) | None (custom retriever) | None (custom retriever) |
| Self-hosted | Yes (Docker, binary) | No (cloud only) | Yes (pip, Docker) | Yes (Docker) |
| Cloud option | Yes (free tier) | Yes (free tier) | No | Yes (free tier) |
| Hybrid search | Yes (dense + sparse) | Yes (sparse + dense) | No | Yes (BM25 + vector) |
| Best for | Production + DSPy | Cloud-native, serverless | Local prototyping | Multi-modal, GraphQL |
| Language | Rust | Managed service | Python | Go |
Starting a new DSPy project?
→ Qdrant (official DSPy package, easiest setup)
Prototyping locally, smallest footprint?
→ ChromaDB (pip install, in-memory or persistent, no server)
Already using Pinecone/Weaviate in production?
→ Write a custom retriever (15 lines, shown above)
Need hybrid search (keyword + semantic)?
→ Qdrant or Weaviate
qdrant_client_url, qdrant_client_api_key, embedding_model, and embedding_dimensions. These do not exist. QdrantRM takes a qdrant_client (an initialized QdrantClient instance) and a vectorizer (a BaseSentenceVectorizer). Always construct the QdrantClient separately, then pass it.document_field="text" but the default is "document". When indexing, store content in a payload field named document (the default), or explicitly set document_field="text" if your payload uses text. Mismatched field names silently return empty passages.from dspy.retrieve.chromadb_rm import ChromadbRM no longer works. Use dspy-qdrant or write a custom dspy.Retrieve subclass.FastEmbedVectorizer using BAAI/bge-small-en-v1.5. Your indexed vectors must match this model. If you indexed with OpenAI embeddings, pass a custom vectorizer that uses the same model.dspy.Embeddings is simpler for in-memory retrieval. If you just need to search a small corpus (under ~100k passages) without a vector DB, use dspy.Embeddings(corpus=docs, embedder=embedder) instead. It handles indexing and search in one class. Use Qdrant when you need persistence, filtering, hybrid search, or scale.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/dspy-retrieval/ai-searching-docs/dspy-ragas/ai-stopping-hallucinations/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.