Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

dtsong/rag-architecture

Name: rag-architecture
Author: dtsong

skills/council/oracle/rag-architecture/SKILL.md

npx skillsauth add dtsong/my-claude-setup rag-architecture

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

RAG Architecture

Purpose

Design a Retrieval-Augmented Generation pipeline, including document processing, chunking strategy, embedding pipeline, vector database selection, retrieval optimization, and context assembly.

Scope Constraints

Reads source document metadata, query patterns, and infrastructure requirements for pipeline design analysis. Does not execute embedding operations, provision vector databases, or access production data directly.

Inputs

Source documents (type, volume, update frequency)
Query patterns (user questions, search terms, structured queries)
Quality requirements (relevance threshold, hallucination tolerance)
Latency requirements (real-time, near-real-time, batch)
Cost constraints (embedding costs, storage costs, query costs)

Input Sanitization

No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.

Procedure

Progress Checklist

[ ] Step 1: Analyze source documents
[ ] Step 2: Design chunking strategy
[ ] Step 3: Select embedding model
[ ] Step 4: Select vector database
[ ] Step 5: Design retrieval pipeline
[ ] Step 6: Design quality metrics

Step 1: Analyze Source Documents

Understand what's being indexed:

Document types: PDFs, web pages, code, structured data, conversations
Volume: Number of documents, total size, growth rate
Update frequency: Static corpus, daily updates, real-time
Structure: Highly structured (tables, headers) vs unstructured (prose, transcripts)
Quality: Clean text vs noisy (OCR artifacts, HTML remnants, duplicates)

Step 2: Design Chunking Strategy

Choose how to split documents:

Fixed-size: 500-1000 tokens with 100-200 token overlap. Simple but may split concepts.
Semantic: Split on paragraph/section boundaries. Preserves meaning but variable size.
Hierarchical: Parent-child chunks (section summary + detail chunks). Best for complex docs.
Recursive: Start large, recursively split until chunks fit size target.
Metadata enrichment: Attach source, section title, page number to each chunk.

Step 3: Select Embedding Model

Choose the embedding approach:

OpenAI text-embedding-3-small/large: Best general-purpose, 1536/3072 dimensions
Cohere embed-v3: Strong multilingual, supports search and classification modes
Open source (BGE, E5): Self-hosted, lower cost at scale, variable quality
Considerations: Dimension size (storage), context length, multilingual support, cost per token

Step 4: Select Vector Database

Choose storage and retrieval:

| Database | Hosted | Open Source | Hybrid Search | Best For | |----------|--------|------------|---------------|----------| | Pinecone | Yes | No | Yes (sparse+dense) | Production, managed | | Weaviate | Yes | Yes | Yes (BM25+vector) | Self-hosted, rich filtering | | ChromaDB | No | Yes | No | Prototyping, local dev | | pgvector | Via Supabase | Yes | BM25 separate | Already using Postgres | | Qdrant | Yes | Yes | Yes | High-performance, filtering |

Step 5: Design Retrieval Pipeline

Build the query-time pipeline:

Query preprocessing: Expand abbreviations, detect intent, generate sub-queries
Embedding: Encode query with same model used for documents
Initial retrieval: Top-K vector search (K=20-50)
Reranking: Cross-encoder reranker to reorder by relevance (return top 5-10)
Context assembly: Combine retrieved chunks into a prompt, add metadata
Generation: LLM call with assembled context + user query

Step 6: Design Quality Metrics

Define how to measure RAG quality:

Retrieval metrics: Recall@K (are relevant docs in top K?), MRR (is the best doc ranked first?)
Generation metrics: Faithfulness (does the answer stick to context?), relevance (does it answer the question?)
End-to-end: Answer accuracy on golden dataset, hallucination rate
Monitoring: Track retrieval scores over time, flag low-confidence answers

Compaction resilience: If context was lost during a long session, re-read the Inputs section to reconstruct what system is being analyzed, check the Progress Checklist for completed steps, then resume from the earliest incomplete step.

Output Format

# RAG Architecture

## Source Analysis
| Attribute | Value |
|-----------|-------|
| Document types | [Types] |
| Corpus size | [Size] |
| Update frequency | [Frequency] |

## Chunking Strategy
**Method:** [Fixed/Semantic/Hierarchical]
**Target chunk size:** [X tokens]
**Overlap:** [X tokens]
**Metadata:** [Fields attached to each chunk]

## Embedding Pipeline
**Model:** [Name]
**Dimensions:** [N]
**Cost:** [$X per 1M tokens]
**Batch processing:** [Strategy for initial load vs incremental updates]

## Vector Database
**Choice:** [Database]
**Rationale:** [Why this DB]
**Index configuration:** [HNSW params, quantization, etc.]
**Hybrid search:** [BM25 + vector approach]

## Retrieval Pipeline

Query → [Preprocess] → [Embed] → [Vector Search (top 20)] → [Rerank (top 5)] → [Assemble Context] → [LLM] → [Validate] → Response


| Stage | Latency | Cost |
|-------|---------|------|
| Embedding | Xms | $X |
| Vector search | Xms | $X |
| Reranking | Xms | $X |
| Generation | Xms | $X |
| **Total** | **Xms** | **$X** |

## Quality Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Recall@10 | >90% | Golden dataset |
| Faithfulness | >95% | Automated scoring |
| Hallucination rate | <5% | Reference checking |

## Cost Model
| Component | Monthly Cost (at X queries/day) |
|-----------|-------------------------------|
| Embeddings | $X |
| Vector DB | $X |
| Reranking | $X |
| Generation | $X |
| **Total** | **$X** |

Handoff

Hand off to prompt-engineering if retrieval pipeline design reveals system prompt optimization needs for context assembly.
Hand off to ai-evaluation if RAG quality metrics require a formal evaluation framework or golden dataset creation.

Quality Checks

[ ] Chunking strategy is justified against document structure (not just default 500 tokens)
[ ] Embedding model matches the query language and domain
[ ] Retrieval pipeline includes reranking (not just raw vector similarity)
[ ] Cost model accounts for both indexing and query costs
[ ] Quality metrics have defined targets and measurement approach
[ ] Update strategy handles incremental changes (not re-index everything)
[ ] Latency budget is broken down by pipeline stage

Evolution Notes

dtsong/rag-architecture

skills/council/oracle/rag-architecture/SKILL.md

Use when designing a Retrieval-Augmented Generation pipeline. Covers document processing, chunking strategy, embedding pipeline, vector database selection, retrieval optimization, and context assembly. Do not use for prompt design (use prompt-engineering) or evaluation framework design (use ai-evaluation).

4 stars

development

Updated Apr 26, 2026

$ install --global

skillsauth

npx skillsauth add dtsong/my-claude-setup rag-architecture

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 26, 2026, 4:21 AM57.7s1 file scanned

SKILL.md

name:: rag-architecture
department:: oracle
description:: Use when designing a Retrieval-Augmented Generation pipeline. Covers document processing, chunking strategy, embedding pipeline, vector database selection, retrieval optimization, and context assembly. Do not use for prompt design (use prompt-engineering) or evaluation framework design (use ai-evaluation).
version:: 1

RAG Architecture

Purpose

Design a Retrieval-Augmented Generation pipeline, including document processing, chunking strategy, embedding pipeline, vector database selection, retrieval optimization, and context assembly.

Scope Constraints

Inputs

Source documents (type, volume, update frequency)
Query patterns (user questions, search terms, structured queries)
Quality requirements (relevance threshold, hallucination tolerance)
Latency requirements (real-time, near-real-time, batch)
Cost constraints (embedding costs, storage costs, query costs)

Input Sanitization

No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.

Procedure

Progress Checklist

[ ] Step 1: Analyze source documents
[ ] Step 2: Design chunking strategy
[ ] Step 3: Select embedding model
[ ] Step 4: Select vector database
[ ] Step 5: Design retrieval pipeline
[ ] Step 6: Design quality metrics

Step 1: Analyze Source Documents

Understand what's being indexed:

Document types: PDFs, web pages, code, structured data, conversations
Volume: Number of documents, total size, growth rate
Update frequency: Static corpus, daily updates, real-time
Structure: Highly structured (tables, headers) vs unstructured (prose, transcripts)
Quality: Clean text vs noisy (OCR artifacts, HTML remnants, duplicates)

Step 2: Design Chunking Strategy

Choose how to split documents:

Fixed-size: 500-1000 tokens with 100-200 token overlap. Simple but may split concepts.
Semantic: Split on paragraph/section boundaries. Preserves meaning but variable size.
Hierarchical: Parent-child chunks (section summary + detail chunks). Best for complex docs.
Recursive: Start large, recursively split until chunks fit size target.
Metadata enrichment: Attach source, section title, page number to each chunk.

Step 3: Select Embedding Model

Choose the embedding approach:

OpenAI text-embedding-3-small/large: Best general-purpose, 1536/3072 dimensions
Cohere embed-v3: Strong multilingual, supports search and classification modes
Open source (BGE, E5): Self-hosted, lower cost at scale, variable quality
Considerations: Dimension size (storage), context length, multilingual support, cost per token

Step 4: Select Vector Database

Choose storage and retrieval:

Step 5: Design Retrieval Pipeline

Build the query-time pipeline:

Query preprocessing: Expand abbreviations, detect intent, generate sub-queries
Embedding: Encode query with same model used for documents
Initial retrieval: Top-K vector search (K=20-50)
Reranking: Cross-encoder reranker to reorder by relevance (return top 5-10)
Context assembly: Combine retrieved chunks into a prompt, add metadata
Generation: LLM call with assembled context + user query

Step 6: Design Quality Metrics

Define how to measure RAG quality:

Retrieval metrics: Recall@K (are relevant docs in top K?), MRR (is the best doc ranked first?)
Generation metrics: Faithfulness (does the answer stick to context?), relevance (does it answer the question?)
End-to-end: Answer accuracy on golden dataset, hallucination rate
Monitoring: Track retrieval scores over time, flag low-confidence answers

Compaction resilience: If context was lost during a long session, re-read the Inputs section to reconstruct what system is being analyzed, check the Progress Checklist for completed steps, then resume from the earliest incomplete step.

Output Format

# RAG Architecture

## Source Analysis
| Attribute | Value |
|-----------|-------|
| Document types | [Types] |
| Corpus size | [Size] |
| Update frequency | [Frequency] |

## Chunking Strategy
**Method:** [Fixed/Semantic/Hierarchical]
**Target chunk size:** [X tokens]
**Overlap:** [X tokens]
**Metadata:** [Fields attached to each chunk]

## Embedding Pipeline
**Model:** [Name]
**Dimensions:** [N]
**Cost:** [$X per 1M tokens]
**Batch processing:** [Strategy for initial load vs incremental updates]

## Vector Database
**Choice:** [Database]
**Rationale:** [Why this DB]
**Index configuration:** [HNSW params, quantization, etc.]
**Hybrid search:** [BM25 + vector approach]

## Retrieval Pipeline

Query → [Preprocess] → [Embed] → [Vector Search (top 20)] → [Rerank (top 5)] → [Assemble Context] → [LLM] → [Validate] → Response


| Stage | Latency | Cost |
|-------|---------|------|
| Embedding | Xms | $X |
| Vector search | Xms | $X |
| Reranking | Xms | $X |
| Generation | Xms | $X |
| **Total** | **Xms** | **$X** |

## Quality Metrics
| Metric | Target | Measurement |
|--------|--------|-------------|
| Recall@10 | >90% | Golden dataset |
| Faithfulness | >95% | Automated scoring |
| Hallucination rate | <5% | Reference checking |

## Cost Model
| Component | Monthly Cost (at X queries/day) |
|-----------|-------------------------------|
| Embeddings | $X |
| Vector DB | $X |
| Reranking | $X |
| Generation | $X |
| **Total** | **$X** |

Handoff

Hand off to prompt-engineering if retrieval pipeline design reveals system prompt optimization needs for context assembly.
Hand off to ai-evaluation if RAG quality metrics require a formal evaluation framework or golden dataset creation.

Quality Checks

[ ] Chunking strategy is justified against document structure (not just default 500 tokens)
[ ] Embedding model matches the query language and domain
[ ] Retrieval pipeline includes reranking (not just raw vector similarity)
[ ] Cost model accounts for both indexing and query costs
[ ] Quality metrics have defined targets and measurement approach
[ ] Update strategy handles incremental changes (not re-index everything)
[ ] Latency budget is broken down by pipeline stage

Evolution Notes

Related Skills

dtsong/enterprise-search-strategy

development

VerifiedTrustedCommunity

Use when the council needs to surface organizational knowledge buried across multiple internal sources (wikis, design docs, ADRs, past tickets, postmortems, chat archives, code repos). Plans where to look, what to cross-reference, and how to synthesize findings into evidence the council can act on. Do not use for external market research (use competitive-analysis), library evaluation (use library-evaluation), or technology trend assessment (use technology-radar).

5SKILL.mdUpdated Jun 23, 2026

dtsong/enterprise-search-strategy

dtsong/docx-to-pdf

testing

VerifiedTrustedCommunity

Use to convert a Word .docx file to PDF and/or verify its page count. Triggers on: converting docx to pdf, rendering a document, checking how many pages a docx produces, or asserting a page-count constraint (e.g. a resume must stay 2 pages). Wraps LibreOffice headless conversion.

5SKILL.mdUpdated Jun 11, 2026

dtsong/web-security-hardening

development

VerifiedTrustedCommunity

Security audit checklist for web applications. Use when reviewing, auditing, or hardening a web app's security posture. Covers rate limiting, auth headers, IP blocking, CORS, security middleware, input validation, file upload limits, ORM usage, and password hashing. Triggers on requests like "review security", "harden this app", "security audit", "check for vulnerabilities", or when building/reviewing API endpoints.

5SKILL.mdUpdated Apr 28, 2026

dtsong/web-security-hardening

dtsong/prompt-wizard

development

VerifiedTrustedCommunity

Interactive wizard to craft effective prompts using Claude Code best practices

5SKILL.mdUpdated Apr 28, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/dtsong/my-claude-setup.git

# Copy into Claude Code skills folder (global)
cp -r my-claude-setup/skills/council/oracle/rag-architecture ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

dtsong/my-claude-setup

4 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT