skills/ai-searching-docs/SKILL.md
Build AI that searches your documents and answers questions. Use when building a knowledge base, help center Q&A, chatting with documents, answering questions from a database, search-and-answer over internal docs, customer support bot, or FAQ system. Also use when embedding search loses critical context, retrieval returns irrelevant results, the right document is buried deep in search results, RAG pipeline tutorial, semantic search over documents, or vector database search quality.
npx skillsauth add lebsral/dspy-programming-not-prompting-lms-skills ai-searching-docsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the user through building an AI that searches documents and answers questions accurately. Uses DSPy's RAG (retrieval-augmented generation) pattern — retrieve relevant passages, then generate an answer grounded in them.
If you have documents in files, databases, or SaaS tools, use LangChain's document loaders to get them into a standard format before building your search pipeline.
from langchain_community.document_loaders import (
PyPDFLoader,
TextLoader,
CSVLoader,
WebBaseLoader,
DirectoryLoader,
NotionDBLoader,
JSONLoader,
)
# PDF files
docs = PyPDFLoader("report.pdf").load()
# All text files in a directory
docs = DirectoryLoader("./docs/", glob="**/*.txt", loader_cls=TextLoader).load()
# Web pages
docs = WebBaseLoader("https://example.com/help").load()
# CSV
docs = CSVLoader("data.csv", source_column="url").load()
# JSON
docs = JSONLoader("data.json", jq_schema=".records[]", content_key="text").load()
Split loaded documents into chunks sized for embedding and retrieval:
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(docs)
# Each chunk has .page_content (text) and .metadata (source info)
| Splitter | Best for |
|----------|----------|
| RecursiveCharacterTextSplitter | General-purpose (recommended default) |
| MarkdownHeaderTextSplitter | Markdown docs — splits by heading |
| TokenTextSplitter | When you need strict token budgets |
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small") # or HuggingFaceEmbeddings, etc.
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./chroma_db")
# Use as a retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
results = retriever.invoke("How do refunds work?")
Other stores follow the same pattern: Pinecone.from_documents(...), FAISS.from_documents(...).
Once your data is loaded and chunked, wire it into a DSPy retriever (Step 3 below) or use ChromaDB directly. For the full LangChain/LangGraph API, see the LangChain docs.
Ask the user:
import dspy
lm = dspy.LM("openai/gpt-4o-mini") # or "anthropic/claude-sonnet-4-5-20250929", etc.
dspy.configure(lm=lm)
class AnswerFromDocs(dspy.Signature):
"""Answer the question based on the given context."""
context: list[str] = dspy.InputField(desc="Relevant passages from the knowledge base")
question: str = dspy.InputField(desc="User's question")
answer: str = dspy.OutputField(desc="Answer grounded in the context")
class DocSearch(dspy.Module):
def __init__(self, num_passages=3):
self.retrieve = dspy.Retrieve(k=num_passages)
self.answer = dspy.ChainOfThought(AnswerFromDocs)
def forward(self, question):
context = self.retrieve(question).passages
return self.answer(context=context, question=question)
DSPy supports multiple search backends. Set up via dspy.configure:
# ColBERTv2 (hosted)
colbert = dspy.ColBERTv2(url="http://your-server:port/endpoint")
dspy.configure(lm=lm, rm=colbert)
# Or wrap your own search (Elasticsearch, Pinecone, pgvector, etc.)
class MySearchBackend(dspy.Retrieve):
def forward(self, query, k=None):
k = k or self.k
# Your search logic here
results = your_search_function(query, top_k=k)
return dspy.Prediction(passages=[r["text"] for r in results])
If you do not have a search backend yet, set one up. For prototyping, use dspy.Embeddings (built-in, no external DB needed) or ChromaDB:
embedder = dspy.Embedder("openai/text-embedding-3-small") # or any supported model
retriever = dspy.Embeddings(corpus=corpus_texts, embedder=embedder, k=5)
# Uses FAISS for large corpora (>20K docs), brute-force for smaller ones
# Use retriever("query") to search — returns dspy.Prediction(passages=..., indices=...)
dspy.configure(lm=lm, rm=retriever)
import chromadb
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection("my_docs")
Split documents into passages before adding them to the vector store. Sentence-based chunking works well for most use cases:
import re
def chunk_text(text, max_sentences=5):
"""Split text into chunks of N sentences."""
sentences = re.split(r'(?<=[.!?])\s+', text.strip())
chunks = []
for i in range(0, len(sentences), max_sentences):
chunk = " ".join(sentences[i:i + max_sentences])
if chunk:
chunks.append(chunk)
return chunks
# Load and chunk your documents
for doc in documents:
chunks = chunk_text(doc["text"])
collection.add(
documents=chunks,
ids=[f"{doc['id']}_chunk_{i}" for i in range(len(chunks))],
metadatas=[{"source": doc["source"]}] * len(chunks),
)
ChromaDB uses its default embedding function, but you can swap in others:
# SentenceTransformers (local, free)
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
ef = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
# OpenAI embeddings (API, paid)
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
ef = OpenAIEmbeddingFunction(api_key="...", model_name="text-embedding-3-small")
collection = client.get_or_create_collection("my_docs", embedding_function=ef)
| Strategy | How it works | Best for | |----------|-------------|----------| | Sentence-based | Split on sentence boundaries | Articles, docs, help pages | | Fixed-size | Split every N characters with overlap | Long unstructured text | | Paragraph | Split on double newlines | Well-structured documents | | Overlap | Fixed-size with N-character overlap between chunks | When context at chunk boundaries matters |
class ChromaRetriever(dspy.Retrieve):
def __init__(self, collection, k=3):
super().__init__(k=k)
self.collection = collection
def forward(self, query, k=None):
k = k or self.k
results = self.collection.query(query_texts=[query], n_results=k)
return dspy.Prediction(passages=results["documents"][0])
# Use it
retriever = ChromaRetriever(collection)
dspy.configure(lm=lm, rm=retriever)
If you already have a vector store, wire it up as a DSPy retriever. Each follows the same pattern — subclass dspy.Retrieve and implement forward():
| Store | Type | Best for | Setup |
|-------|------|----------|-------|
| ChromaDB | Embedded | Prototyping, small datasets | pip install chromadb |
| Pinecone | Cloud | Managed, serverless, scales to billions | pip install pinecone |
| Qdrant | Self-hosted or cloud | Open-source, filtering, hybrid search | pip install qdrant-client |
| Weaviate | Self-hosted or cloud | Multi-modal, GraphQL, hybrid search | pip install weaviate-client |
| pgvector | Postgres extension | Teams already using Postgres | pip install pgvector sqlalchemy |
from pinecone import Pinecone
class PineconeRetriever(dspy.Retrieve):
def __init__(self, index_name, api_key, embed_fn, k=3):
super().__init__(k=k)
pc = Pinecone(api_key=api_key)
self.index = pc.Index(index_name)
self.embed_fn = embed_fn # function: str -> list[float]
def forward(self, query, k=None):
k = k or self.k
vector = self.embed_fn(query)
results = self.index.query(vector=vector, top_k=k, include_metadata=True)
passages = [m["metadata"]["text"] for m in results["matches"]]
return dspy.Prediction(passages=passages)
from qdrant_client import QdrantClient
class QdrantRetriever(dspy.Retrieve):
def __init__(self, collection_name, url, embed_fn, k=3):
super().__init__(k=k)
self.client = QdrantClient(url=url)
self.collection = collection_name
self.embed_fn = embed_fn
def forward(self, query, k=None):
k = k or self.k
vector = self.embed_fn(query)
results = self.client.query_points(
collection_name=self.collection, query=vector, limit=k,
)
passages = [p.payload["text"] for p in results.points]
return dspy.Prediction(passages=passages)
import weaviate
class WeaviateRetriever(dspy.Retrieve):
def __init__(self, collection_name, url, k=3):
super().__init__(k=k)
self.client = weaviate.connect_to_local(url=url) # or connect_to_weaviate_cloud
self.collection = self.client.collections.get(collection_name)
def forward(self, query, k=None):
k = k or self.k
results = self.collection.query.near_text(query=query, limit=k)
passages = [o.properties["text"] for o in results.objects]
return dspy.Prediction(passages=passages)
from sqlalchemy import create_engine, text
class PgvectorRetriever(dspy.Retrieve):
def __init__(self, engine, table, embed_fn, k=3):
super().__init__(k=k)
self.engine = engine
self.table = table
self.embed_fn = embed_fn
def forward(self, query, k=None):
k = k or self.k
vector = self.embed_fn(query)
sql = text(f"""
SELECT content FROM {self.table}
ORDER BY embedding <=> :vec LIMIT :k
""")
with self.engine.connect() as conn:
rows = conn.execute(sql, {"vec": str(vector), "k": k}).fetchall()
passages = [row[0] for row in rows]
return dspy.Prediction(passages=passages)
All of these work as drop-in replacements for dspy.Retrieve:
retriever = PineconeRetriever("my-index", api_key="...", embed_fn=embed)
dspy.configure(lm=lm, rm=retriever)
# Or pass directly to your module
When questions need info from multiple places:
class GenerateSearchQuery(dspy.Signature):
"""Generate a search query to find missing information."""
context: list[str] = dspy.InputField(desc="Information gathered so far")
question: str = dspy.InputField(desc="The question to answer")
query: str = dspy.OutputField(desc="Search query to find missing information")
class MultiStepSearch(dspy.Module):
def __init__(self, num_passages=3, num_searches=2):
self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(num_searches)]
self.answer = dspy.ChainOfThought(AnswerFromDocs)
def forward(self, question):
context = []
for hop in self.generate_query:
query = hop(context=context, question=question).query
passages = self.retrieve(query).passages
context = deduplicate(context + passages)
return self.answer(context=context, question=question)
def deduplicate(passages):
seen = set()
result = []
for p in passages:
if p not in seen:
seen.add(p)
result.append(p)
return result
def search_metric(example, prediction, trace=None):
# Exact match (simple)
return prediction.answer == example.answer
# Or use an AI judge for open-ended answers
class JudgeAnswer(dspy.Signature):
"""Is the predicted answer correct given the expected answer?"""
question: str = dspy.InputField()
gold_answer: str = dspy.InputField()
predicted_answer: str = dspy.InputField()
is_correct: bool = dspy.OutputField()
def judge_metric(example, prediction, trace=None):
judge = dspy.Predict(JudgeAnswer)
result = judge(
question=example.question,
gold_answer=example.answer,
predicted_answer=prediction.answer,
)
return result.is_correct
optimizer = dspy.BootstrapFewShot(metric=search_metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(DocSearch(), trainset=trainset)
# Typical improvement: 45-60% exact match -> 65-80% after optimization
# For further gains, upgrade to MIPROv2:
# optimizer = dspy.MIPROv2(metric=search_metric, auto="medium")
tsvector) outperforms embedding search. Use RAG only when queries are semantic.ChainOfThought for the answer step — reasoning typically helps ground answers in the documents. Use Predict if latency matters more than accuracydspy.Refine to ensure answers actually cite the documents by scoring citation presence in a reward function/dspy-gepa for detailsdspy.Embeddings as a built-in retriever — it handles embedding, FAISS indexing, and search in one class without needing a separate vector store (see reference.md for API details)k (number of retrieved passages) is a critical hyperparameter. Too few and you miss relevant context; too many and you overwhelm the LM. Tune it against your dev set.dspy.ChainOfThought with multi-step retrieval for these cases.Install any skill:
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill <name>
/ai-summarizing/ai-serving-apis/ai-building-chatbots/ai-improving-accuracy/dspy-signatures/dspy-chain-of-thought/ai-do if you do not have it — it routes any AI problem to the right skill and is the fastest way to work: npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-dotools
See what is happening during optimizer.compile() instead of waiting blind. Use when you want to watch optimization progress, see scores as they come in, know if your optimizer is working, check if optimization is stuck, understand why optimization is taking too long, get live progress during compile, monitor convergence, detect overfitting during optimization, interpret optimization results, or pick the right tool for watching optimization. Also used for optimizer progress bar, is my optimizer doing anything, optimization seems stuck, how long will optimization take, watch GEPA run, watch MIPROv2 run, live optimization dashboard, optimizer not improving, scores not going up, optimization taking forever, see what optimizer is doing, debug slow optimization, optimization visibility, optimizer metrics, track compile progress, optimization observability.
testing
Use when you want the highest-quality prompt optimization DSPy offers — jointly optimizes instructions and few-shot demos, with auto=light/medium/heavy presets. Common scenarios - you want the best possible accuracy from prompt optimization, jointly tuning instructions and few-shot demonstrations, using auto presets for different compute budgets, or when COPRO or BootstrapFewShot alone are not reaching your accuracy target. Related - ai-improving-accuracy, dspy-copro, dspy-bootstrap-few-shot. Also used for dspy.MIPROv2, best DSPy optimizer, highest quality optimization, auto=light medium heavy, joint instruction and demo optimization, most powerful prompt optimizer, MIPROv2 vs COPRO vs BootstrapFewShot, which optimizer should I use, state of the art prompt optimization, when to use MIPROv2, optimize both instructions and examples, heavy optimization for production, best optimizer for accuracy.
testing
Use LangWatch for DSPy auto-tracing and real-time optimizer progress. Use when you want to set up LangWatch, langwatch.dspy.init, auto-tracing DSPy, real-time optimization dashboard, optimizer progress tracking, app.langwatch.ai, or DSPy optimizer dashboard. Also used for langwatch setup, pip install langwatch, langwatch trace, optimizer progress, real-time optimization, watch optimizer run, LangWatch self-hosted, langwatch docker, langwatch vs langtrace, langwatch autotrack_dspy.
data-ai
Use when you want to optimize instructions without few-shot examples — a lightweight alternative to COPRO when you do not have or do not want to use demonstrations. Common scenarios - optimizing instructions when you do not have or do not want to use few-shot demonstrations, lightweight instruction search as a first step, tasks where examples in the prompt confuse the model, or when you want fast instruction optimization without the cost of COPRO. Related - ai-improving-accuracy, dspy-copro, dspy-miprov2. Also used for dspy.GEPA, instruction optimization without demos, lightweight prompt optimization, optimize instructions only, no few-shot examples needed, GEPA vs COPRO, quick instruction search, when demonstrations hurt performance, zero-shot optimization, instruction-only optimizer, simplest instruction tuner, fast prompt optimization, skip few-shot and just tune instructions, optimize Pydantic field descriptions, GEPA structured output, GEPA does not optimize field desc.