ai-rag/building-rag-pipelines/SKILL.md
Design and implement production-quality RAG (Retrieval-Augmented Generation) pipelines with hybrid search, reranking, agentic patterns, and continuous learning.
npx skillsauth add 7a336e6e/skills building-rag-pipelinesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
4 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Create a RAG system that achieves >90% retrieval precision, supports iterative reasoning via tools, learns from usage patterns, and respects user privacy preferences.
Implement tiered storage to optimize for different access patterns:
# Tier 0: Cold - Raw files on disk (archives, uploads)
# Tier 1: Warm - Chunked text in SQLite with metadata
# Tier 2: Hot - Vector embeddings in ChromaDB
# Tier 3: Cache - LRU in-memory for frequent chunks
class Chunk(db.Model):
chunk_id = db.Column(db.String(64), primary_key=True)
content = db.Column(db.Text, nullable=False)
source_file = db.Column(db.String(500), index=True)
source_type = db.Column(db.String(50), index=True) # log, config, etc.
artifact_category = db.Column(db.String(50), index=True)
token_count = db.Column(db.Integer)
Combine dense (vector) and sparse (BM25) retrieval:
def hybrid_search(query: str, top_k: int = 10) -> list[Chunk]:
# Dense: Semantic similarity via embeddings
vector_results = collection.query(query_texts=[query], n_results=top_k * 2)
# Sparse: Keyword matching via BM25
bm25_results = bm25_index.search(query, top_k * 2)
# Score fusion with RRF (Reciprocal Rank Fusion)
return reciprocal_rank_fusion(vector_results, bm25_results, k=60)
Rerank candidates for precision:
from sentence_transformers import CrossEncoder
reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
def rerank_chunks(query: str, chunks: list, top_k: int = 10) -> list:
pairs = [(query, chunk['text']) for chunk in chunks]
scores = reranker.predict(pairs)
for chunk, score in zip(chunks, scores):
chunk['cross_encoder_score'] = float(score)
return sorted(chunks, key=lambda x: x['cross_encoder_score'], reverse=True)[:top_k]
Use LLM to expand queries for better recall:
def rewrite_query(query: str) -> str:
prompt = f"""Expand this search query with related terms:
Query: {query}
Add synonyms, related concepts, and domain-specific terminology.
Return expanded query as space-separated terms."""
return llm.generate(prompt)
def generate_hyde_document(query: str) -> str:
"""Generate hypothetical document that would answer the query."""
prompt = f"""Generate a document excerpt that would answer: {query}
Write as if you're quoting from the actual source material."""
return llm.generate(prompt)
Enable entity-aware retrieval:
import re
PATTERNS = {
'ipv4': re.compile(r'\b(?:\d{1,3}\.){3}\d{1,3}\b'),
'filepath': re.compile(r'(?:/[\w.-]+)+'),
'username': re.compile(r'user[=:\s]+(\w+)', re.IGNORECASE),
}
def extract_entities(text: str) -> list[Entity]:
entities = []
for entity_type, pattern in PATTERNS.items():
for match in pattern.finditer(text):
entities.append(Entity(
entity_type=entity_type,
value=match.group(),
context=text[max(0, match.start()-50):match.end()+50]
))
return entities
Let the LLM decide what to search:
AGENT_TOOLS = [
{"name": "search_chunks", "description": "Search documents"},
{"name": "search_entity", "description": "Find by IP/user/file"},
{"name": "traverse_graph", "description": "Explore relationships"},
{"name": "final_answer", "description": "Provide final response"}
]
def agent_loop(query: str, max_iterations: int = 5):
history = []
for i in range(max_iterations):
response = llm.generate(build_agent_prompt(query, history))
tool, params = parse_tool_call(response)
if tool == "final_answer":
return params["answer"]
result = execute_tool(tool, params)
history.append({"action": tool, "result": result})
Learn from LLM usage patterns:
def record_usage(chunks: list, response: str, query: str):
for chunk in chunks:
# Detect if chunk was cited in response
if chunk['source_file'] in response.lower():
chunk_relevance.citation_count += 1
# Detect content overlap
elif phrase_overlap(chunk['text'], response) > 0.3:
chunk_relevance.usage_count += 1
# Update relevance score
chunk_relevance.score = citations * 1.0 + usages * 0.5
A complete RAG service should provide:
ingest(files) → Chunk, embed, extract entities, build graphquery(text) → Retrieve, rerank, generate responsequery_agent(text) → Iterative search with reasoningget_entities(type) → List extracted entitiesget_relevance_stats() → View learning progress../backend/scaffolding-flask/SKILL.md — API structure../database/designing-schemas/SKILL.md — Model design| Reference | Description | |-----------|-------------| | chunking-strategies.md | Document chunking patterns, token budgets, and overlap strategies | | embedding-models.md | Model comparison, hybrid search, and BM25 integration | | agentic-patterns.md | ReAct agent loops, tool design, and iterative reasoning | | graph-rag.md | Entity relationship graphs, traversal algorithms, kill chain analysis | | relevance-feedback.md | Learning from usage patterns, citation detection, score boosting |
development
Implement features using the Red-Green-Refactor cycle to ensure testability and correctness from the start.
data-ai
Manage the `tasks.md` ledger with strict locking and collision avoidance protocols to allow multiple agents to work in parallel safely.
development
The git-workflow skill defines branching conventions, commit message formats, and pull request standards that all agents must follow for consistent version control.
development
The environment-config skill standardizes how agents manage environment variables, secrets, and application configuration across local development and deployed environments.