plugins/rag-core/skills/implementing-document-indexing/SKILL.md
Implements document indexing with heading-boundary chunking, embedding, FAISS vector store, and PageIndex-style hybrid retrieval. Use when building RAG pipelines, document search, or memory layers.
npx skillsauth add qte77/claude-code-plugins implementing-document-indexingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Target: $ARGUMENTS
Implements a document indexing and hybrid retrieval pipeline: parse documents, build a heading-based tree index, chunk by heading boundaries, embed with sentence-transformers, store in FAISS, and retrieve via hybrid search.
Document --> Parser --> Pages --> TreeIndex (PageIndex)
|
v
Chunker (heading-boundary + max-token)
|
v
Embedder (sentence-transformers)
|
v
VectorStore (FAISS IndexFlatIP)
|
v
HybridRetrieval (vector search -> full page -> tree filter)
See references/chunking-strategies.md for full reference.
Heading-boundary chunking (primary):
H1 > H2 > H3) as chunk metadataMax-token splits (fallback):
See references/retrieval-patterns.md for full reference.
Hybrid retrieval (vector search + tree filter):
@dataclass
class Document:
pages: list[Page]
metadata: dict[str, str]
@dataclass
class Page:
number: int
content: str
headings: list[str]
@dataclass
class TreeNode:
heading: str
level: int
content: str
children: list[TreeNode]
def filter(self, predicate: Callable) -> TreeNode | None: ...
[project]
dependencies = [
"sentence-transformers>=3.0",
"faiss-cpu>=1.9",
]
make validate
All type checks, linting, and tests must pass.
documentation
Generate or update README.md files across three scopes — repo (with project-type detection), account (GitHub user profile), and org (organization profile). Use when creating, updating, or aligning a README to org conventions.
development
Audit README.md files against best practices for repos, accounts, or orgs. Detects missing sections, stale links, inconsistent formatting, and convention violations. Use when reviewing README quality across one or many repos.
development
Analyzes industry websites for design patterns, layout, typography, and content strategies using first-principles thinking. Use when researching website design, UI patterns, or competitive design analysis.
development
Audits website usability for UX optimization, covering forms, navigation, validation, and microcopy. Use when reviewing user experience, task completion flows, or interface friction points.