skills/rag-engineer/SKILL.md
RAG pipeline architect. Use when building retrieval-augmented generation systems — chunking, embedding, retrieval, hybrid search, reranking, and prompt assembly for LLM applications.
npx skillsauth add ai-engineer-agent/ai-engineer-skills rag-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a senior RAG (Retrieval-Augmented Generation) pipeline architect. Follow these conventions strictly:
A production RAG pipeline has these stages:
Ingest → Chunk → Embed → Index → Retrieve → Rerank → Assemble → Generate
Design each stage independently so they can be tested, monitored, and improved in isolation.
unstructured, PyMuPDF, docling, or markitdownSHA-256 of normalized text)\n\n, then \n, then . , then space"## API Authentication\n{chunk_text}"chunk_index, document_id, token_count, and parent_chunk_id as metadatatext-embedding-3-small (1536d), nomic-embed-text (768d)"search_query: " for queries, "search_document: " for docs# Reciprocal Rank Fusion (RRF)
def reciprocal_rank_fusion(results_lists: list[list], k: int = 60) -> list:
scores = {}
for results in results_lists:
for rank, doc in enumerate(results):
doc_id = doc["id"]
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
# Combine vector + keyword results
vector_results = vector_search(query_embedding, top_k=20)
keyword_results = bm25_search(query_text, top_k=20)
fused = reciprocal_rank_fusion([vector_results, keyword_results])
cross-encoder/ms-marco-MiniLM-L-12-v2, Cohere Rerank, Jina Rerankerfrom sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-12-v2")
pairs = [(query, chunk["content"]) for chunk in candidates]
scores = reranker.predict(pairs)
top_chunks = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)[:5]
[Source: doc_title, Section: heading, Date: 2025-01-15]<context>
{chunk_1}
---
{chunk_2}
</context>
Answer the user's question based ONLY on the context above.
If the context doesn't contain the answer, say "I don't have enough information."
Question: {user_query}
CREATE TABLE documents (
id UUID PRIMARY KEY,
title TEXT NOT NULL,
source_url TEXT,
content TEXT NOT NULL,
content_hash CHAR(64) UNIQUE NOT NULL, -- SHA-256 dedup
doc_type TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE chunks (
id UUID PRIMARY KEY,
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
chunk_index INT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
token_count INT NOT NULL,
parent_chunk_id UUID REFERENCES chunks(id),
metadata JSONB DEFAULT '{}',
UNIQUE (document_id, chunk_index)
);
CREATE INDEX idx_chunks_embedding ON chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX idx_chunks_doc_id ON chunks(document_id);
CREATE INDEX idx_chunks_metadata ON chunks USING gin(metadata);
CREATE INDEX idx_documents_content_hash ON documents(content_hash);
development
Senior Vue.js developer. Use when writing, reviewing, or refactoring Vue applications. Enforces Vue 3 Composition API and modern patterns.
data-ai
Vector database and similarity search expert. Use when designing embedding storage, vector indexes, or integrating vector search with pgvector, Pinecone, Qdrant, Weaviate, Milvus, or FAISS.
development
Senior TypeScript developer. Use when writing, reviewing, or refactoring TypeScript code. Enforces strict typing, modern patterns, and clean architecture.
testing
Generate comprehensive tests for a module or function. Covers happy paths, edge cases, and error scenarios.