Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

latestaiagents/chunking-strategies

Name: chunking-strategies
Author: latestaiagents

skills/rag-architect/chunking-strategies/SKILL.md

npx skillsauth add latestaiagents/agent-skills chunking-strategies

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Chunking Strategies for RAG

Optimal chunking is the difference between good and great RAG performance.

Why Chunking Matters

Poor chunking causes:

Context fragmentation (answers split across chunks)
Irrelevant retrieval (too much noise in chunks)
Lost relationships (parent-child content separated)
Wasted tokens (chunks too large or too small)

Chunking Methods Comparison

| Method | Best For | Chunk Quality | Implementation | |--------|----------|---------------|----------------| | Fixed-size | Simple docs, uniform content | Medium | Easy | | Recursive | Structured docs, markdown | High | Medium | | Semantic | Complex docs, varied content | Highest | Complex | | Parent-child | Hierarchical docs | High | Medium | | Late chunking | Preserving context | Highest | Complex |

Pattern 1: Fixed-Size with Overlap

The baseline approach - simple but effective:

from langchain.text_splitter import RecursiveCharacterTextSplitter

def create_fixed_chunks(
    text: str,
    chunk_size: int = 512,
    chunk_overlap: int = 50
) -> list[str]:
    """
    Split text into fixed-size chunks with overlap.

    Guidelines:
    - chunk_size: 256-1024 tokens (512 is solid default)
    - overlap: 10-20% of chunk_size
    """
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        separators=["\n\n", "\n", ". ", " ", ""]
    )
    return splitter.split_text(text)

Pattern 2: Semantic Chunking

Group by meaning, not arbitrary boundaries:

from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

def create_semantic_chunks(text: str) -> list[str]:
    """
    Split text based on semantic similarity between sentences.
    Keeps related content together.
    """
    embeddings = OpenAIEmbeddings()

    splitter = SemanticChunker(
        embeddings=embeddings,
        breakpoint_threshold_type="percentile",
        breakpoint_threshold_amount=95  # Higher = fewer, larger chunks
    )

    return splitter.split_text(text)

Custom Semantic Chunking

import numpy as np
from sentence_transformers import SentenceTransformer

def semantic_chunk(
    sentences: list[str],
    model_name: str = "all-MiniLM-L6-v2",
    threshold: float = 0.5
) -> list[list[str]]:
    """
    Group sentences by semantic similarity.
    """
    model = SentenceTransformer(model_name)
    embeddings = model.encode(sentences)

    chunks = []
    current_chunk = [sentences[0]]

    for i in range(1, len(sentences)):
        # Cosine similarity between consecutive sentences
        sim = np.dot(embeddings[i-1], embeddings[i]) / (
            np.linalg.norm(embeddings[i-1]) * np.linalg.norm(embeddings[i])
        )

        if sim >= threshold:
            current_chunk.append(sentences[i])
        else:
            chunks.append(current_chunk)
            current_chunk = [sentences[i]]

    chunks.append(current_chunk)
    return chunks

Pattern 3: Parent-Child Chunking

Retrieve small, return with context:

from llama_index.core.node_parser import (
    HierarchicalNodeParser,
    SentenceSplitter,
    get_leaf_nodes
)
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.core.retrievers import AutoMergingRetriever

def create_hierarchical_index(documents):
    """
    Create parent-child chunk hierarchy.
    Small chunks for retrieval, auto-merge to parents for context.
    """
    # Define chunk sizes for each level
    node_parser = HierarchicalNodeParser.from_defaults(
        chunk_sizes=[2048, 512, 128]  # Parent → Child → Leaf
    )

    nodes = node_parser.get_nodes_from_documents(documents)
    leaf_nodes = get_leaf_nodes(nodes)

    # Store all nodes
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    # Index only leaf nodes
    index = VectorStoreIndex(
        leaf_nodes,
        storage_context=storage_context
    )

    # Retriever auto-merges to parents when siblings retrieved
    retriever = AutoMergingRetriever(
        index.as_retriever(similarity_top_k=12),
        storage_context=storage_context,
        simple_ratio_thresh=0.3  # Merge if 30%+ siblings retrieved
    )

    return retriever

Pattern 4: Late Chunking (2026 Technique)

Embed full document first, then chunk - preserves global context:

def late_chunking(
    document: str,
    model,
    chunk_size: int = 512
) -> list[dict]:
    """
    Late chunking: embed document, then split embeddings.
    Preserves document-level context in chunk embeddings.

    Reference: Jina AI Late Chunking (2024)
    """
    # 1. Get token-level embeddings for full document
    tokens = model.tokenize(document)
    token_embeddings = model.encode_tokens(tokens)

    # 2. Split into chunks
    chunks = []
    for i in range(0, len(tokens), chunk_size):
        chunk_tokens = tokens[i:i + chunk_size]
        chunk_embeddings = token_embeddings[i:i + chunk_size]

        # 3. Pool chunk embeddings (mean pooling)
        chunk_vector = np.mean(chunk_embeddings, axis=0)

        chunks.append({
            "text": model.decode(chunk_tokens),
            "embedding": chunk_vector
        })

    return chunks

Pattern 5: Markdown/Code-Aware Chunking

from langchain.text_splitter import (
    MarkdownHeaderTextSplitter,
    Language,
    RecursiveCharacterTextSplitter
)

def chunk_markdown(text: str) -> list[dict]:
    """Split markdown by headers, preserving structure."""
    headers_to_split_on = [
        ("#", "h1"),
        ("##", "h2"),
        ("###", "h3"),
    ]

    splitter = MarkdownHeaderTextSplitter(
        headers_to_split_on=headers_to_split_on
    )

    return splitter.split_text(text)


def chunk_code(code: str, language: str = "python") -> list[str]:
    """Split code respecting language syntax."""
    lang_map = {
        "python": Language.PYTHON,
        "javascript": Language.JS,
        "typescript": Language.TS,
    }

    splitter = RecursiveCharacterTextSplitter.from_language(
        language=lang_map.get(language, Language.PYTHON),
        chunk_size=1000,
        chunk_overlap=100
    )

    return splitter.split_text(code)

Chunk Size Guidelines

| Content Type | Recommended Size | Overlap | |--------------|------------------|---------| | Q&A / FAQ | 256-512 | 25-50 | | Technical docs | 512-1024 | 50-100 | | Legal documents | 1024-2048 | 100-200 | | Code | 500-1000 | 50-100 | | Conversations | 256-512 | 50-100 |

Evaluation: How to Know If Chunking Is Good

def evaluate_chunking(chunks: list[str], test_queries: list[dict]):
    """
    Evaluate chunk quality with test queries.

    test_queries format:
    [{"query": "What is X?", "expected_chunk_contains": "X is..."}]
    """
    results = {
        "avg_chunk_size": np.mean([len(c) for c in chunks]),
        "chunk_size_std": np.std([len(c) for c in chunks]),
        "total_chunks": len(chunks),
        "retrieval_hits": 0
    }

    for tq in test_queries:
        # Check if expected content is in a single chunk
        for chunk in chunks:
            if tq["expected_chunk_contains"] in chunk:
                results["retrieval_hits"] += 1
                break

    results["hit_rate"] = results["retrieval_hits"] / len(test_queries)
    return results

Best Practices

Match chunk size to query length - Chunks should be similar size to expected queries
Preserve meaning boundaries - Never split mid-sentence or mid-paragraph
Include metadata - Add source, page, section info to each chunk
Test with real queries - Evaluate on your actual use cases
Consider retrieval model - Some embedding models prefer specific chunk sizes

Quick Decision Tree

What type of content?
├─ Structured (headers, sections)
│   └─ Use: Markdown/recursive splitter + hierarchy
├─ Unstructured (prose, articles)
│   └─ Use: Semantic chunking
├─ Code
│   └─ Use: Language-aware splitter
└─ Mixed
    └─ Use: Parent-child with semantic leaves

latestaiagents/chunking-strategies

skills/rag-architect/chunking-strategies/SKILL.md

Optimize document chunking for RAG performance and retrieval quality. Use this skill when splitting documents, choosing chunk sizes, implementing semantic chunking, or improving RAG retrieval accuracy. Activate when: chunking, split documents, chunk size, text splitting, document processing, RAG performance, semantic chunking, overlap.

2 stars

testing

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add latestaiagents/agent-skills chunking-strategies

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 2:56 AM9.0s1 file scanned

SKILL.md

name:: chunking-strategies
description:: |
Activate when:: chunking, split documents, chunk size, text splitting,

Chunking Strategies for RAG

Optimal chunking is the difference between good and great RAG performance.

Why Chunking Matters

Poor chunking causes:

Context fragmentation (answers split across chunks)
Irrelevant retrieval (too much noise in chunks)
Lost relationships (parent-child content separated)
Wasted tokens (chunks too large or too small)

Chunking Methods Comparison

Pattern 1: Fixed-Size with Overlap

The baseline approach - simple but effective:

from langchain.text_splitter import RecursiveCharacterTextSplitter

def create_fixed_chunks(
    text: str,
    chunk_size: int = 512,
    chunk_overlap: int = 50
) -> list[str]:
    """
    Split text into fixed-size chunks with overlap.

    Guidelines:
    - chunk_size: 256-1024 tokens (512 is solid default)
    - overlap: 10-20% of chunk_size
    """
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        separators=["\n\n", "\n", ". ", " ", ""]
    )
    return splitter.split_text(text)

Pattern 2: Semantic Chunking

Group by meaning, not arbitrary boundaries:

from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

def create_semantic_chunks(text: str) -> list[str]:
    """
    Split text based on semantic similarity between sentences.
    Keeps related content together.
    """
    embeddings = OpenAIEmbeddings()

    splitter = SemanticChunker(
        embeddings=embeddings,
        breakpoint_threshold_type="percentile",
        breakpoint_threshold_amount=95  # Higher = fewer, larger chunks
    )

    return splitter.split_text(text)

Custom Semantic Chunking

import numpy as np
from sentence_transformers import SentenceTransformer

def semantic_chunk(
    sentences: list[str],
    model_name: str = "all-MiniLM-L6-v2",
    threshold: float = 0.5
) -> list[list[str]]:
    """
    Group sentences by semantic similarity.
    """
    model = SentenceTransformer(model_name)
    embeddings = model.encode(sentences)

    chunks = []
    current_chunk = [sentences[0]]

    for i in range(1, len(sentences)):
        # Cosine similarity between consecutive sentences
        sim = np.dot(embeddings[i-1], embeddings[i]) / (
            np.linalg.norm(embeddings[i-1]) * np.linalg.norm(embeddings[i])
        )

        if sim >= threshold:
            current_chunk.append(sentences[i])
        else:
            chunks.append(current_chunk)
            current_chunk = [sentences[i]]

    chunks.append(current_chunk)
    return chunks

Pattern 3: Parent-Child Chunking

Retrieve small, return with context:

from llama_index.core.node_parser import (
    HierarchicalNodeParser,
    SentenceSplitter,
    get_leaf_nodes
)
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.core.retrievers import AutoMergingRetriever

def create_hierarchical_index(documents):
    """
    Create parent-child chunk hierarchy.
    Small chunks for retrieval, auto-merge to parents for context.
    """
    # Define chunk sizes for each level
    node_parser = HierarchicalNodeParser.from_defaults(
        chunk_sizes=[2048, 512, 128]  # Parent → Child → Leaf
    )

    nodes = node_parser.get_nodes_from_documents(documents)
    leaf_nodes = get_leaf_nodes(nodes)

    # Store all nodes
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)

    # Index only leaf nodes
    index = VectorStoreIndex(
        leaf_nodes,
        storage_context=storage_context
    )

    # Retriever auto-merges to parents when siblings retrieved
    retriever = AutoMergingRetriever(
        index.as_retriever(similarity_top_k=12),
        storage_context=storage_context,
        simple_ratio_thresh=0.3  # Merge if 30%+ siblings retrieved
    )

    return retriever

Pattern 4: Late Chunking (2026 Technique)

Embed full document first, then chunk - preserves global context:

def late_chunking(
    document: str,
    model,
    chunk_size: int = 512
) -> list[dict]:
    """
    Late chunking: embed document, then split embeddings.
    Preserves document-level context in chunk embeddings.

    Reference: Jina AI Late Chunking (2024)
    """
    # 1. Get token-level embeddings for full document
    tokens = model.tokenize(document)
    token_embeddings = model.encode_tokens(tokens)

    # 2. Split into chunks
    chunks = []
    for i in range(0, len(tokens), chunk_size):
        chunk_tokens = tokens[i:i + chunk_size]
        chunk_embeddings = token_embeddings[i:i + chunk_size]

        # 3. Pool chunk embeddings (mean pooling)
        chunk_vector = np.mean(chunk_embeddings, axis=0)

        chunks.append({
            "text": model.decode(chunk_tokens),
            "embedding": chunk_vector
        })

    return chunks

Pattern 5: Markdown/Code-Aware Chunking

from langchain.text_splitter import (
    MarkdownHeaderTextSplitter,
    Language,
    RecursiveCharacterTextSplitter
)

def chunk_markdown(text: str) -> list[dict]:
    """Split markdown by headers, preserving structure."""
    headers_to_split_on = [
        ("#", "h1"),
        ("##", "h2"),
        ("###", "h3"),
    ]

    splitter = MarkdownHeaderTextSplitter(
        headers_to_split_on=headers_to_split_on
    )

    return splitter.split_text(text)


def chunk_code(code: str, language: str = "python") -> list[str]:
    """Split code respecting language syntax."""
    lang_map = {
        "python": Language.PYTHON,
        "javascript": Language.JS,
        "typescript": Language.TS,
    }

    splitter = RecursiveCharacterTextSplitter.from_language(
        language=lang_map.get(language, Language.PYTHON),
        chunk_size=1000,
        chunk_overlap=100
    )

    return splitter.split_text(code)

Chunk Size Guidelines

Evaluation: How to Know If Chunking Is Good

def evaluate_chunking(chunks: list[str], test_queries: list[dict]):
    """
    Evaluate chunk quality with test queries.

    test_queries format:
    [{"query": "What is X?", "expected_chunk_contains": "X is..."}]
    """
    results = {
        "avg_chunk_size": np.mean([len(c) for c in chunks]),
        "chunk_size_std": np.std([len(c) for c in chunks]),
        "total_chunks": len(chunks),
        "retrieval_hits": 0
    }

    for tq in test_queries:
        # Check if expected content is in a single chunk
        for chunk in chunks:
            if tq["expected_chunk_contains"] in chunk:
                results["retrieval_hits"] += 1
                break

    results["hit_rate"] = results["retrieval_hits"] / len(test_queries)
    return results

Best Practices

Match chunk size to query length - Chunks should be similar size to expected queries
Preserve meaning boundaries - Never split mid-sentence or mid-paragraph
Include metadata - Add source, page, section info to each chunk
Test with real queries - Evaluate on your actual use cases
Consider retrieval model - Some embedding models prefer specific chunk sizes

Quick Decision Tree

What type of content?
├─ Structured (headers, sections)
│   └─ Use: Markdown/recursive splitter + hierarchy
├─ Unstructured (prose, articles)
│   └─ Use: Semantic chunking
├─ Code
│   └─ Use: Language-aware splitter
└─ Mixed
    └─ Use: Parent-child with semantic leaves

Related Skills

latestaiagents/skill-testing

development

VerifiedTrustedCommunity

Test skills for correct activation, content quality, and regression — both automated checks (frontmatter validity, lint) and manual verification (query-suite activation testing). Covers CI integration and how to catch skill regressions before users do. Use this skill when adding skills to a repo, setting up CI for a skill library, or debugging "the skill exists but doesn't work". Activate when: test skills, validate skills, skill CI, skill linting, skill activation test, skill regression.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-testing

latestaiagents/skill-frontmatter

documentation

VerifiedTrustedCommunity

Write the YAML frontmatter for a SKILL.md file so it activates reliably — name, description, and activation keywords that the model matches against. Covers length, tone, and the most common frontmatter mistakes. Use this skill when authoring a new skill, fixing a skill that isn't auto-activating, or reviewing skills for publication. Activate when: SKILL.md frontmatter, skill description, skill activation, skill YAML, write a skill, author a skill.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-frontmatter

latestaiagents/skill-activation-patterns

development

VerifiedTrustedCommunity

Design skills that fire at the right moment — neither over-eager (noise) nor under-eager (silent). Covers activation specificity, trigger phrases, disambiguation between overlapping skills, and debugging activation. Use this skill when multiple skills could fire on the same query, a skill never fires, or a skill fires too often. Activate when: skill won't activate, skill over-activates, overlapping skills, skill triggers, skill selection, skill disambiguation.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/skill-activation-patterns

latestaiagents/progressive-disclosure

development

VerifiedTrustedCommunity

Structure SKILL.md content so the model reads just enough — concise summary up front, progressively deeper detail, examples on demand. Covers section ordering, length budgets, when to split into multiple skills. Use this skill when writing or refactoring a skill body, one skill has grown too long, or a skill is wordy but not useful. Activate when: SKILL.md structure, skill content, skill too long, split skill, progressive disclosure, skill body.

2SKILL.mdUpdated Apr 23, 2026

latestaiagents/progressive-disclosure

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/latestaiagents/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/skills/rag-architect/chunking-strategies ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

latestaiagents/agent-skills

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT