skills/council/oracle/rag-architecture/SKILL.md
Use when designing a Retrieval-Augmented Generation pipeline. Covers document processing, chunking strategy, embedding pipeline, vector database selection, retrieval optimization, and context assembly. Do not use for prompt design (use prompt-engineering) or evaluation framework design (use ai-evaluation).
npx skillsauth add dtsong/my-claude-setup rag-architectureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design a Retrieval-Augmented Generation pipeline, including document processing, chunking strategy, embedding pipeline, vector database selection, retrieval optimization, and context assembly.
Reads source document metadata, query patterns, and infrastructure requirements for pipeline design analysis. Does not execute embedding operations, provision vector databases, or access production data directly.
No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.
Understand what's being indexed:
Choose how to split documents:
Choose the embedding approach:
Choose storage and retrieval:
| Database | Hosted | Open Source | Hybrid Search | Best For | |----------|--------|------------|---------------|----------| | Pinecone | Yes | No | Yes (sparse+dense) | Production, managed | | Weaviate | Yes | Yes | Yes (BM25+vector) | Self-hosted, rich filtering | | ChromaDB | No | Yes | No | Prototyping, local dev | | pgvector | Via Supabase | Yes | BM25 separate | Already using Postgres | | Qdrant | Yes | Yes | Yes | High-performance, filtering |
Build the query-time pipeline:
Define how to measure RAG quality:
Compaction resilience: If context was lost during a long session, re-read the Inputs section to reconstruct what system is being analyzed, check the Progress Checklist for completed steps, then resume from the earliest incomplete step.
# RAG Architecture
## Source Analysis
| Attribute | Value |
|-----------|-------|
| Document types | [Types] |
| Corpus size | [Size] |
| Update frequency | [Frequency] |
## Chunking Strategy
**Method:** [Fixed/Semantic/Hierarchical]
**Target chunk size:** [X tokens]
**Overlap:** [X tokens]
**Metadata:** [Fields attached to each chunk]
## Embedding Pipeline
**Model:** [Name]
**Dimensions:** [N]
**Cost:** [$X per 1M tokens]
**Batch processing:** [Strategy for initial load vs incremental updates]
## Vector Database
**Choice:** [Database]
**Rationale:** [Why this DB]
**Index configuration:** [HNSW params, quantization, etc.]
**Hybrid search:** [BM25 + vector approach]
## Retrieval Pipeline
Query → [Preprocess] → [Embed] → [Vector Search (top 20)] → [Rerank (top 5)] → [Assemble Context] → [LLM] → [Validate] → Response
| Stage | Latency | Cost |
|-------|---------|------|
| Embedding | Xms | $X |
| Vector search | Xms | $X |
| Reranking | Xms | $X |
| Generation | Xms | $X |
| **Total** | **Xms** | **$X** |
## Quality Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Recall@10 | >90% | Golden dataset |
| Faithfulness | >95% | Automated scoring |
| Hallucination rate | <5% | Reference checking |
## Cost Model
| Component | Monthly Cost (at X queries/day) |
|-----------|-------------------------------|
| Embeddings | $X |
| Vector DB | $X |
| Reranking | $X |
| Generation | $X |
| **Total** | **$X** |
development
Use when planning implementation steps, deciding commit format, or structuring development approach. Provides brainstorm-plan-implement flow with conventional commits. Triggers on 'how should I approach this', 'commit format'.
development
Security audit checklist for web applications. Use when reviewing, auditing, or hardening a web app's security posture. Covers rate limiting, auth headers, IP blocking, CORS, security middleware, input validation, file upload limits, ORM usage, and password hashing. Triggers on requests like "review security", "harden this app", "security audit", "check for vulnerabilities", or when building/reviewing API endpoints.
development
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
development
React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.