- name:
- rag
- description:
- Plan, scaffold, document, index, evaluate, or extend a retrieval-augmented knowledge base for company knowledge, including project docs, table docs, process docs, glossary entries, metadata, chunking, retrieval checks, and rollout gates.
RAG
- Shared examples and formatting reference: references/EXAMPLE.md.
- Keep new guidance, snippets, and edits aligned with that file.
Scope
Use this skill when:
- designing or refactoring a
knowledge_base/ layout
- documenting projects, DB tables, business processes, APIs, jobs, or glossary terms
- planning or reviewing chunking, metadata, indexing, retrieval, or eval scripts
- defining seed questions, pass criteria, coverage gates, or rollout rules
- extending an existing RAG corpus without losing source evidence
RAG work should favor small verified coverage over broad speculative catalogs.
Phase Order
- Inventory
- Schema and layout
- Seed documentation
- Prototype indexing
- Retrieval evaluation
- Coverage expansion
- Optional integration
Run Plan, Explore, Implement, Verify, and Finalize inside each phase when the phase is non-trivial.
Inventory
- List target source repositories, schema files, docs, migrations, APIs, jobs, and ownership hints.
- Classify sources by authority: source code, DDL, migrations, tests, docs, logs, generated artifacts, or human notes.
- Identify unavailable or untrusted sources before writing documents.
- Pick the smallest high-value seed area first.
- Define the questions the seed area must answer.
Inventory decides what can be documented safely.
Schema And Layout
- Keep one layout contract in
knowledge_base/_schema.md.
- Keep evaluation data in
knowledge_base/_eval/questions.jsonl.
- Keep project docs under
knowledge_base/projects/<project_name>/README.md.
- Keep table docs under
knowledge_base/tables/<schema>/<table_name>.md.
- Keep process docs under
knowledge_base/processes/<process_name>.md.
- Keep glossary docs under
knowledge_base/glossary/<term>.md.
- Store generated indexes under
knowledge_base/_index/ and keep them out of version control unless explicitly
required.
Use lowercase snake_case for generated knowledge-base filenames.
Documentation Rules
- Read real source before drafting.
- Separate confirmed facts from inferred behavior.
- Mark unknown or unverified facts as
UNVERIFIED.
- Cite source paths or explicit inference origins in every document.
- Keep each document in one language.
- Prefer concrete sentences over generic summaries.
- Include caveats and stale-risk notes when source evidence is weak.
- Do not invent project purpose, table meaning, column semantics, ownership, or process steps.
Every document should help a teammate answer a real operational question.
Required Document Sections
- Project docs: purpose, users, entry points, related tables/processes, APIs/jobs, caveats, and evidence.
- Table docs: purpose, lifecycle, columns, relationships, constraints/indexes, business rules, caveats, and evidence.
- Process docs: trigger, ordered steps, outcomes, related projects, touched tables/APIs/jobs, failure modes, and
evidence.
- Glossary docs: definition, nearby confusing terms, source locations, examples, and caveats.
- Schema docs: required frontmatter or metadata, allowed document types, naming rules, and evaluation metadata.
Keep required sections stable so chunking and eval remain predictable.
Chunking And Metadata
- Prefer section-boundary chunking using
## headings.
- Keep one table document as one chunk unless it exceeds the target model or retriever limits.
- Track metadata such as
type, project, schema, table, process, legacy, source_path, and verified_at
when available.
- Preserve stable document IDs across re-indexing.
- Avoid chunk overlap that separates a claim from its evidence.
- Do not index drafts that still contain unresolved source conflicts unless the retrieval layer can filter drafts.
Chunking should preserve answerable units and evidence.
Evaluation Rules
- Maintain curated questions in
knowledge_base/_eval/questions.jsonl.
- Include expected source paths or expected answer fragments for each question.
- Add at least 20 curated questions before claiming prototype retrieval quality.
- Measure retrieval with top-k evidence before measuring generation quality.
- Report pass rate, failed question categories, and coverage gaps.
- Re-run eval after each batch expansion or chunking/config change.
- Do not claim retrieval quality without measured results.
Evaluation is the release gate for corpus expansion.
Integration Rules
- Expose retrieval to agents or tools only after seed documentation and local eval are stable.
- Keep skill instructions in
SKILL.md.
- Keep examples in
references/EXAMPLE.md.
- Keep only UI-facing manifest metadata in
agents/openai.yaml unless product-specific agent fields are explicitly
required.
- Place indexing, retrieval, embedding, model, or eval runtime defaults in the actual tool or project config that
consumes them.
- Add agent dependencies only when a real external dependency is required.
Do not hide runtime RAG configuration in skill metadata.
Quality Gates
- Source evidence exists for each new claim.
- Seed docs answer the target questions.
- Indexing is reproducible from committed source docs.
- Eval pass rate meets the stated threshold.
- Failed questions are reviewed before expanding scope.
- Unknowns are marked, not guessed.
Small, verified, and maintainable beats large, stale, and speculative.
Boundaries
- Do not document unavailable systems as confirmed facts.
- Do not use generated indexes as source-of-truth documents.
- Do not mix runtime tool configuration into skill metadata.
- Do not expand coverage before the seed set passes retrieval checks.
- Do not copy boilerplate that adds no searchable knowledge.