Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jmsktm/RAG Pipeline Builder

Name: RAG Pipeline Builder
Author: jmsktm

skills/rag-pipeline-builder/SKILL.md

npx skillsauth add jmsktm/claude-settings RAG Pipeline Builder

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

RAG Pipeline Builder

The RAG Pipeline Builder skill guides you through designing and implementing Retrieval-Augmented Generation systems that enhance LLM responses with relevant context from your own data. RAG combines the power of large language models with the precision of information retrieval, reducing hallucinations and enabling AI to work with private, current, or domain-specific knowledge.

This skill covers the complete RAG stack: document ingestion, chunking strategies, embedding generation, vector storage, retrieval optimization, context injection, and response generation. It helps you make informed decisions at each stage based on your specific requirements for accuracy, latency, cost, and scale.

Whether you are building a documentation Q&A bot, a customer support system, or an enterprise knowledge assistant, this skill ensures your RAG implementation follows production best practices.

Core Workflows

Workflow 1: Design RAG Architecture

Define requirements:
- Data sources and formats
- Query types and patterns
- Accuracy requirements
- Latency budget
- Scale expectations
Choose components:
- Document loaders
- Chunking strategy
- Embedding model
- Vector database
- LLM for generation
- Reranking layer (optional)

Design data flow:

Documents → Loader → Chunker → Embedder → Vector DB
                                               ↓
Query → Embedder → Vector Search → Reranker → Context
                                               ↓
Context + Query → LLM → Response

Document architecture decisions

Workflow 2: Implement Ingestion Pipeline

Set up document loaders:
- PDF, Markdown, HTML parsers
- API connectors for live sources
- Incremental update handling

Implement chunking:

def smart_chunk(doc, chunk_size=500, overlap=50):
    # Respect document structure
    sections = extract_sections(doc)
    chunks = []
    for section in sections:
        if len(section) > chunk_size:
            chunks.extend(sliding_window(section, chunk_size, overlap))
        else:
            chunks.append(section)
    return add_metadata(chunks, doc)

Generate embeddings with batching
Store in vector database with metadata
Verify ingestion quality

Workflow 3: Optimize Retrieval Quality

Measure baseline retrieval performance:
- Recall@k for known queries
- Mean Reciprocal Rank (MRR)
- Relevance scoring
Apply optimization techniques:
- Query expansion/rewriting
- Hybrid search (semantic + keyword)
- Reranking with cross-encoders
- Metadata filtering
Tune retrieval parameters:
- Number of chunks to retrieve (k)
- Similarity threshold
- Diversity/MMR settings
Validate improvements with test set

Quick Reference

| Action | Command/Trigger | |--------|-----------------| | Design RAG system | "Help me design a RAG pipeline for [use case]" | | Choose vector DB | "Which vector database for RAG" | | Optimize chunking | "Best chunking strategy for [content type]" | | Improve retrieval | "My RAG has poor retrieval quality" | | Reduce hallucinations | "RAG still hallucinating, help fix" | | Scale pipeline | "Scale RAG to [X] documents" |

Best Practices

Chunk at Semantic Boundaries: Preserve meaning in chunks
- Good: Split at paragraphs, sections, or topic boundaries
- Bad: Fixed-size splits that cut sentences mid-thought
- Include section headers as context in chunks
Include Rich Metadata: Enable filtering and context
- Source document, section, page number
- Timestamps for temporal relevance
- Categories, tags, or topics
- Use metadata filters before semantic search
Use Hybrid Search: Combine semantic and keyword search
- Semantic: Captures meaning and synonyms
- Keyword (BM25): Catches exact terms, names, codes
- Weight combination based on query type
Rerank for Quality: Two-stage retrieval improves precision
- Stage 1: Fast vector search (retrieve 20-50)
- Stage 2: Cross-encoder reranking (keep top 5-10)
- Reranking is slower but much more accurate
Show Your Work: Include citations and sources
- Return source chunks with responses
- Enable users to verify and explore
- Build trust through transparency
Handle Edge Cases: What happens when retrieval fails?
- No relevant results found
- Conflicting information in sources
- Query outside knowledge base scope
- Implement graceful fallbacks

Advanced Techniques

Multi-Index Strategy

Use different indexes for different content types:

Index 1: FAQs (short, self-contained)
Index 2: Documentation (long-form, structured)
Index 3: Conversations (temporal, contextual)

Route queries to appropriate index based on intent

Query Transformation Pipeline

Improve retrieval with query processing:

def transform_query(query):
    # Step 1: Classify query type
    query_type = classify_query(query)

    # Step 2: Extract entities
    entities = extract_entities(query)

    # Step 3: Generate search queries
    if query_type == "factual":
        return generate_keyword_queries(query, entities)
    elif query_type == "conceptual":
        return generate_semantic_queries(query)
    else:
        return [query]  # Use as-is

Contextual Compression

Reduce noise in retrieved context:

Retrieved chunks (verbose) → LLM compressor → Relevant excerpts only

Agentic RAG

Let the LLM control retrieval:

def agentic_rag(query):
    # LLM decides what to search for
    search_plan = llm.plan_searches(query)

    # Execute searches
    results = []
    for search in search_plan:
        results.extend(retriever.search(search.query, filters=search.filters))

    # LLM synthesizes answer
    return llm.synthesize(query, results)

Evaluation Framework

Continuously measure RAG quality:

Metrics:
- Retrieval: Precision@k, Recall@k, MRR
- Generation: Faithfulness, Answer Relevance, Context Utilization
- End-to-end: Task Success Rate, User Satisfaction

Tools: Ragas, TruLens, LangSmith

Common Pitfalls to Avoid

Chunking too large (loses specificity) or too small (loses context)
Not preserving document structure and hierarchy in chunks
Ignoring keyword search when exact matches matter
Retrieving too few chunks (missing information) or too many (context dilution)
Not handling conflicting information across sources
Assuming LLM will always use retrieved context correctly
Skipping evaluation and monitoring in production
Not updating embeddings when source documents change

jmsktm/RAG Pipeline Builder

skills/rag-pipeline-builder/SKILL.md

Build retrieval-augmented generation systems that ground LLM responses in your data

2 stars

development

Updated Apr 6, 2026

$ install --global

skillsauth

npx skillsauth add jmsktm/claude-settings RAG Pipeline Builder

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 6, 2026, 10:34 AM4.5s1 file scanned

SKILL.md

name:: RAG Pipeline Builder
slug:: rag-pipeline-builder
description:: Build retrieval-augmented generation systems that ground LLM responses in your data
category:: ai-ml
complexity:: advanced
version:: 1.0.0
author:: ID8Labs

RAG Pipeline Builder

Whether you are building a documentation Q&A bot, a customer support system, or an enterprise knowledge assistant, this skill ensures your RAG implementation follows production best practices.

Core Workflows

Workflow 1: Design RAG Architecture

Define requirements:
- Data sources and formats
- Query types and patterns
- Accuracy requirements
- Latency budget
- Scale expectations
Choose components:
- Document loaders
- Chunking strategy
- Embedding model
- Vector database
- LLM for generation
- Reranking layer (optional)

Design data flow:

Documents → Loader → Chunker → Embedder → Vector DB
                                               ↓
Query → Embedder → Vector Search → Reranker → Context
                                               ↓
Context + Query → LLM → Response

Document architecture decisions

Workflow 2: Implement Ingestion Pipeline

Set up document loaders:
- PDF, Markdown, HTML parsers
- API connectors for live sources
- Incremental update handling

Implement chunking:

def smart_chunk(doc, chunk_size=500, overlap=50):
    # Respect document structure
    sections = extract_sections(doc)
    chunks = []
    for section in sections:
        if len(section) > chunk_size:
            chunks.extend(sliding_window(section, chunk_size, overlap))
        else:
            chunks.append(section)
    return add_metadata(chunks, doc)

Generate embeddings with batching
Store in vector database with metadata
Verify ingestion quality

Workflow 3: Optimize Retrieval Quality

Measure baseline retrieval performance:
- Recall@k for known queries
- Mean Reciprocal Rank (MRR)
- Relevance scoring
Apply optimization techniques:
- Query expansion/rewriting
- Hybrid search (semantic + keyword)
- Reranking with cross-encoders
- Metadata filtering
Tune retrieval parameters:
- Number of chunks to retrieve (k)
- Similarity threshold
- Diversity/MMR settings
Validate improvements with test set

Quick Reference

Best Practices

Chunk at Semantic Boundaries: Preserve meaning in chunks
- Good: Split at paragraphs, sections, or topic boundaries
- Bad: Fixed-size splits that cut sentences mid-thought
- Include section headers as context in chunks
Include Rich Metadata: Enable filtering and context
- Source document, section, page number
- Timestamps for temporal relevance
- Categories, tags, or topics
- Use metadata filters before semantic search
Use Hybrid Search: Combine semantic and keyword search
- Semantic: Captures meaning and synonyms
- Keyword (BM25): Catches exact terms, names, codes
- Weight combination based on query type
Rerank for Quality: Two-stage retrieval improves precision
- Stage 1: Fast vector search (retrieve 20-50)
- Stage 2: Cross-encoder reranking (keep top 5-10)
- Reranking is slower but much more accurate
Show Your Work: Include citations and sources
- Return source chunks with responses
- Enable users to verify and explore
- Build trust through transparency
Handle Edge Cases: What happens when retrieval fails?
- No relevant results found
- Conflicting information in sources
- Query outside knowledge base scope
- Implement graceful fallbacks

Advanced Techniques

Multi-Index Strategy

Use different indexes for different content types:

Index 1: FAQs (short, self-contained)
Index 2: Documentation (long-form, structured)
Index 3: Conversations (temporal, contextual)

Route queries to appropriate index based on intent

Query Transformation Pipeline

Improve retrieval with query processing:

def transform_query(query):
    # Step 1: Classify query type
    query_type = classify_query(query)

    # Step 2: Extract entities
    entities = extract_entities(query)

    # Step 3: Generate search queries
    if query_type == "factual":
        return generate_keyword_queries(query, entities)
    elif query_type == "conceptual":
        return generate_semantic_queries(query)
    else:
        return [query]  # Use as-is

Contextual Compression

Reduce noise in retrieved context:

Retrieved chunks (verbose) → LLM compressor → Relevant excerpts only

Agentic RAG

Let the LLM control retrieval:

def agentic_rag(query):
    # LLM decides what to search for
    search_plan = llm.plan_searches(query)

    # Execute searches
    results = []
    for search in search_plan:
        results.extend(retriever.search(search.query, filters=search.filters))

    # LLM synthesizes answer
    return llm.synthesize(query, results)

Evaluation Framework

Continuously measure RAG quality:

Metrics:
- Retrieval: Precision@k, Recall@k, MRR
- Generation: Faithfulness, Answer Relevance, Context Utilization
- End-to-end: Task Success Rate, User Satisfaction

Tools: Ragas, TruLens, LangSmith

Common Pitfalls to Avoid

Chunking too large (loses specificity) or too small (loses context)
Not preserving document structure and hierarchy in chunks
Ignoring keyword search when exact matches matter
Retrieving too few chunks (missing information) or too many (context dilution)
Not handling conflicting information across sources
Assuming LLM will always use retrieved context correctly
Skipping evaluation and monitoring in production
Not updating embeddings when source documents change

Related Skills

jmsktm/YouTube Optimizer

data-ai

VerifiedTrustedCommunity

Optimize YouTube videos for SEO, thumbnails, descriptions, and audience retention

2SKILL.mdUpdated Apr 6, 2026

jmsktm/YouTube Optimizer

jmsktm/Workshop Facilitator

testing

VerifiedTrustedCommunity

Design and facilitate effective workshops with agendas, activities, and outcomes

2SKILL.mdUpdated Apr 6, 2026

jmsktm/Workshop Facilitator

jmsktm/Workflow Designer

data-ai

VerifiedTrustedCommunity

Design and optimize AI-powered workflows for complex tasks

2SKILL.mdUpdated Apr 6, 2026

jmsktm/Workflow Designer

jmsktm/Workflow Automator

data-ai

VerifiedTrustedCommunity

Design and implement automated workflows to eliminate repetitive tasks and streamline processes

2SKILL.mdUpdated Apr 6, 2026

jmsktm/Workflow Automator

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jmsktm/claude-settings.git

# Copy into Claude Code skills folder (global)
cp -r claude-settings/skills/rag-pipeline-builder ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jmsktm/claude-settings

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT