skills/rag-architect/chunking-strategies/SKILL.md
Optimize document chunking for RAG performance and retrieval quality. Use this skill when splitting documents, choosing chunk sizes, implementing semantic chunking, or improving RAG retrieval accuracy. Activate when: chunking, split documents, chunk size, text splitting, document processing, RAG performance, semantic chunking, overlap.
npx skillsauth add latestaiagents/agent-skills chunking-strategiesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Optimal chunking is the difference between good and great RAG performance.
Poor chunking causes:
| Method | Best For | Chunk Quality | Implementation | |--------|----------|---------------|----------------| | Fixed-size | Simple docs, uniform content | Medium | Easy | | Recursive | Structured docs, markdown | High | Medium | | Semantic | Complex docs, varied content | Highest | Complex | | Parent-child | Hierarchical docs | High | Medium | | Late chunking | Preserving context | Highest | Complex |
The baseline approach - simple but effective:
from langchain.text_splitter import RecursiveCharacterTextSplitter
def create_fixed_chunks(
text: str,
chunk_size: int = 512,
chunk_overlap: int = 50
) -> list[str]:
"""
Split text into fixed-size chunks with overlap.
Guidelines:
- chunk_size: 256-1024 tokens (512 is solid default)
- overlap: 10-20% of chunk_size
"""
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
length_function=len,
separators=["\n\n", "\n", ". ", " ", ""]
)
return splitter.split_text(text)
Group by meaning, not arbitrary boundaries:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings
def create_semantic_chunks(text: str) -> list[str]:
"""
Split text based on semantic similarity between sentences.
Keeps related content together.
"""
embeddings = OpenAIEmbeddings()
splitter = SemanticChunker(
embeddings=embeddings,
breakpoint_threshold_type="percentile",
breakpoint_threshold_amount=95 # Higher = fewer, larger chunks
)
return splitter.split_text(text)
import numpy as np
from sentence_transformers import SentenceTransformer
def semantic_chunk(
sentences: list[str],
model_name: str = "all-MiniLM-L6-v2",
threshold: float = 0.5
) -> list[list[str]]:
"""
Group sentences by semantic similarity.
"""
model = SentenceTransformer(model_name)
embeddings = model.encode(sentences)
chunks = []
current_chunk = [sentences[0]]
for i in range(1, len(sentences)):
# Cosine similarity between consecutive sentences
sim = np.dot(embeddings[i-1], embeddings[i]) / (
np.linalg.norm(embeddings[i-1]) * np.linalg.norm(embeddings[i])
)
if sim >= threshold:
current_chunk.append(sentences[i])
else:
chunks.append(current_chunk)
current_chunk = [sentences[i]]
chunks.append(current_chunk)
return chunks
Retrieve small, return with context:
from llama_index.core.node_parser import (
HierarchicalNodeParser,
SentenceSplitter,
get_leaf_nodes
)
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.core.retrievers import AutoMergingRetriever
def create_hierarchical_index(documents):
"""
Create parent-child chunk hierarchy.
Small chunks for retrieval, auto-merge to parents for context.
"""
# Define chunk sizes for each level
node_parser = HierarchicalNodeParser.from_defaults(
chunk_sizes=[2048, 512, 128] # Parent → Child → Leaf
)
nodes = node_parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes)
# Store all nodes
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
# Index only leaf nodes
index = VectorStoreIndex(
leaf_nodes,
storage_context=storage_context
)
# Retriever auto-merges to parents when siblings retrieved
retriever = AutoMergingRetriever(
index.as_retriever(similarity_top_k=12),
storage_context=storage_context,
simple_ratio_thresh=0.3 # Merge if 30%+ siblings retrieved
)
return retriever
Embed full document first, then chunk - preserves global context:
def late_chunking(
document: str,
model,
chunk_size: int = 512
) -> list[dict]:
"""
Late chunking: embed document, then split embeddings.
Preserves document-level context in chunk embeddings.
Reference: Jina AI Late Chunking (2024)
"""
# 1. Get token-level embeddings for full document
tokens = model.tokenize(document)
token_embeddings = model.encode_tokens(tokens)
# 2. Split into chunks
chunks = []
for i in range(0, len(tokens), chunk_size):
chunk_tokens = tokens[i:i + chunk_size]
chunk_embeddings = token_embeddings[i:i + chunk_size]
# 3. Pool chunk embeddings (mean pooling)
chunk_vector = np.mean(chunk_embeddings, axis=0)
chunks.append({
"text": model.decode(chunk_tokens),
"embedding": chunk_vector
})
return chunks
from langchain.text_splitter import (
MarkdownHeaderTextSplitter,
Language,
RecursiveCharacterTextSplitter
)
def chunk_markdown(text: str) -> list[dict]:
"""Split markdown by headers, preserving structure."""
headers_to_split_on = [
("#", "h1"),
("##", "h2"),
("###", "h3"),
]
splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on
)
return splitter.split_text(text)
def chunk_code(code: str, language: str = "python") -> list[str]:
"""Split code respecting language syntax."""
lang_map = {
"python": Language.PYTHON,
"javascript": Language.JS,
"typescript": Language.TS,
}
splitter = RecursiveCharacterTextSplitter.from_language(
language=lang_map.get(language, Language.PYTHON),
chunk_size=1000,
chunk_overlap=100
)
return splitter.split_text(code)
| Content Type | Recommended Size | Overlap | |--------------|------------------|---------| | Q&A / FAQ | 256-512 | 25-50 | | Technical docs | 512-1024 | 50-100 | | Legal documents | 1024-2048 | 100-200 | | Code | 500-1000 | 50-100 | | Conversations | 256-512 | 50-100 |
def evaluate_chunking(chunks: list[str], test_queries: list[dict]):
"""
Evaluate chunk quality with test queries.
test_queries format:
[{"query": "What is X?", "expected_chunk_contains": "X is..."}]
"""
results = {
"avg_chunk_size": np.mean([len(c) for c in chunks]),
"chunk_size_std": np.std([len(c) for c in chunks]),
"total_chunks": len(chunks),
"retrieval_hits": 0
}
for tq in test_queries:
# Check if expected content is in a single chunk
for chunk in chunks:
if tq["expected_chunk_contains"] in chunk:
results["retrieval_hits"] += 1
break
results["hit_rate"] = results["retrieval_hits"] / len(test_queries)
return results
What type of content?
├─ Structured (headers, sections)
│ └─ Use: Markdown/recursive splitter + hierarchy
├─ Unstructured (prose, articles)
│ └─ Use: Semantic chunking
├─ Code
│ └─ Use: Language-aware splitter
└─ Mixed
└─ Use: Parent-child with semantic leaves
development
Test skills for correct activation, content quality, and regression — both automated checks (frontmatter validity, lint) and manual verification (query-suite activation testing). Covers CI integration and how to catch skill regressions before users do. Use this skill when adding skills to a repo, setting up CI for a skill library, or debugging "the skill exists but doesn't work". Activate when: test skills, validate skills, skill CI, skill linting, skill activation test, skill regression.
documentation
Write the YAML frontmatter for a SKILL.md file so it activates reliably — name, description, and activation keywords that the model matches against. Covers length, tone, and the most common frontmatter mistakes. Use this skill when authoring a new skill, fixing a skill that isn't auto-activating, or reviewing skills for publication. Activate when: SKILL.md frontmatter, skill description, skill activation, skill YAML, write a skill, author a skill.
development
Design skills that fire at the right moment — neither over-eager (noise) nor under-eager (silent). Covers activation specificity, trigger phrases, disambiguation between overlapping skills, and debugging activation. Use this skill when multiple skills could fire on the same query, a skill never fires, or a skill fires too often. Activate when: skill won't activate, skill over-activates, overlapping skills, skill triggers, skill selection, skill disambiguation.
development
Structure SKILL.md content so the model reads just enough — concise summary up front, progressively deeper detail, examples on demand. Covers section ordering, length budgets, when to split into multiple skills. Use this skill when writing or refactoring a skill body, one skill has grown too long, or a skill is wordy but not useful. Activate when: SKILL.md structure, skill content, skill too long, split skill, progressive disclosure, skill body.