.claude/skills/rag-engineer/SKILL.md
Provides Retrieval-Augmented Generation patterns covering embedding models, vector databases, chunking strategies, and retrieval optimization. Use when building RAG systems or when the user mentions RAG, vector search, embeddings, or retrieval-augmented generation.
npx skillsauth add tranhieutt/software_development_department rag-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Role: RAG Systems Architect
I bridge the gap between raw documents and LLM understanding. I know that retrieval quality determines generation quality - garbage in, garbage out. I obsess over chunking boundaries, embedding dimensions, and similarity metrics because they make the difference between helpful and hallucinating.
Chunk by meaning, not arbitrary token counts
- Use sentence boundaries, not token limits
- Detect topic shifts with embedding similarity
- Preserve document structure (headers, paragraphs)
- Include overlap for context continuity
- Add metadata for filtering
Multi-level retrieval for better precision
- Index at multiple chunk sizes (paragraph, section, document)
- First pass: coarse retrieval for candidates
- Second pass: fine-grained retrieval for precision
- Use parent-child relationships for context
Combine semantic and keyword search
- BM25/TF-IDF for keyword matching
- Vector similarity for semantic matching
- Reciprocal Rank Fusion for combining scores
- Weight tuning based on query type
| Issue | Severity | Solution | |-------|----------|----------| | Fixed-size chunking breaks sentences and context | high | Use semantic chunking that respects document structure: | | Pure semantic search without metadata pre-filtering | medium | Implement hybrid filtering: | | Using same embedding model for different content types | medium | Evaluate embeddings per content type: | | Using first-stage retrieval results directly | medium | Add reranking step: | | Cramming maximum context into LLM prompt | medium | Use relevance thresholds: | | Not measuring retrieval quality separately from generation | high | Separate retrieval evaluation: | | Not updating embeddings when source documents change | medium | Implement embedding refresh: | | Same retrieval strategy for all query types | medium | Implement hybrid search: |
Works well with: ai-agents-architect, prompt-engineer, database-architect, backend
This skill is applicable to execute the workflow or actions described in the overview.
testing
Generates high-fidelity architecture diagrams, sequence flows, and component maps for SDD projects. Use when finalizing a design phase, documenting system architecture, or visualizing agentic workflows. Default style: Style 6 (Claude Official).
data-ai
Provides vector database and semantic search patterns for Pinecone, Weaviate, Qdrant, Milvus, and pgvector in RAG and recommendation systems. Use when implementing vector search or when the user mentions vector database, semantic search, embeddings, or similarity search.
development
Updates docs/technical/CODEMAP.md by scanning the current codebase structure. Run after a significant feature merge, refactor, or when CODEMAP feels stale.
development
Unlocks the codebase after a release freeze or incident freeze period to resume normal development. Use when a freeze period ends or when the user mentions unfreezing or lifting the code freeze.