src/skills/ai-orchestration-llamaindex/SKILL.md
LlamaIndex.TS data framework for RAG, indexing, retrieval, query engines, chat engines, and agentic workflows in TypeScript
npx skillsauth add agents-inc/skills ai-orchestration-llamaindexInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick Guide: LlamaIndex.TS is a data framework for building context-aware LLM applications in TypeScript. Use
Settingssingleton to configure LLM and embedding models globally. Load documents withSimpleDirectoryReader, chunk withSentenceSplitter, index withVectorStoreIndex.fromDocuments(), and query withindex.asQueryEngine(). For agents, useagent()from@llamaindex/workflowwithtool()definitions using Zod schemas. All core operations are async -- every function returns a Promise. Thellamaindexpackage re-exports most things, but LLM providers require separate packages like@llamaindex/openaior@llamaindex/ollama.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST configure Settings.llm and Settings.embedModel before any indexing or querying -- the Settings singleton is lazily initialized and defaults to OpenAI, which will fail without an API key)
(You MUST await all LlamaIndex operations -- fromDocuments(), asQueryEngine(), query(), chat(), loadData() are ALL async)
(You MUST install provider packages separately -- @llamaindex/openai, @llamaindex/ollama, @llamaindex/anthropic are NOT included in the base llamaindex package)
(You MUST use storageContextFromDefaults({ persistDir }) to persist indexes -- without persistence, indexes are rebuilt from scratch on every restart)
(You MUST never hardcode API keys -- use environment variables and dotenv/config)
</critical_requirements>
Auto-detection: LlamaIndex, llamaindex, VectorStoreIndex, SimpleDirectoryReader, Settings.llm, Settings.embedModel, asQueryEngine, asChatEngine, ContextChatEngine, SentenceSplitter, storageContextFromDefaults, @llamaindex/openai, @llamaindex/ollama, @llamaindex/workflow, FunctionTool, QueryEngineTool, agentStreamEvent
When to use:
Key patterns covered:
agent() and tool() using Zod schemasWhen NOT to use:
LlamaIndex.TS is a data framework -- its core value proposition is connecting your data to LLMs through indexing, retrieval, and synthesis. It sits between raw LLM APIs and full application frameworks.
Core principles:
llamaindex package provides the framework; providers are installed separately.When to use LlamaIndex.TS:
When NOT to use:
The Settings singleton configures LLM, embedding model, and node parser globally. Set it once at application startup before any indexing or querying.
import { Settings } from "llamaindex";
import { openai, OpenAIEmbedding } from "@llamaindex/openai";
// Configure at app startup -- before any index operations
Settings.llm = openai({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
Why good: Single configuration point, provider packages are explicit imports, model names are visible
// BAD: No Settings configuration, relying on implicit defaults
import { VectorStoreIndex, SimpleDirectoryReader } from "llamaindex";
// This will silently try to use OpenAI with OPENAI_API_KEY from env
// Fails with cryptic error if key is missing
const documents = await new SimpleDirectoryReader().loadData({
directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(documents);
Why bad: Implicit defaults make failures confusing, no explicit provider, no model selection
See: examples/core.md for local LLM setup with Ollama, Anthropic configuration, and embedding model options
Load documents, create a vector index, and query it. This is the canonical RAG pipeline.
import { SimpleDirectoryReader, VectorStoreIndex, Settings } from "llamaindex";
import { openai, OpenAIEmbedding } from "@llamaindex/openai";
Settings.llm = openai({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
// Load all supported files from a directory
const documents = await new SimpleDirectoryReader().loadData({
directoryPath: "./data",
});
// Create vector index -- embeds and stores all document chunks
const index = await VectorStoreIndex.fromDocuments(documents);
// Query the index
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query: "What is the main topic?" });
console.log(response.message.content);
Why good: Complete pipeline in minimal code, explicit Settings, clear data flow
See: examples/core.md for persistence, custom readers, and advanced indexing options
Persist indexes to disk to avoid re-indexing on every restart.
import {
VectorStoreIndex,
storageContextFromDefaults,
SimpleDirectoryReader,
} from "llamaindex";
const PERSIST_DIR = "./storage";
// First run: create and persist
const storageContext = await storageContextFromDefaults({
persistDir: PERSIST_DIR,
});
const documents = await new SimpleDirectoryReader().loadData({
directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(documents, {
storageContext,
});
// Subsequent runs: load from storage
const loadedStorageContext = await storageContextFromDefaults({
persistDir: PERSIST_DIR,
});
const loadedIndex = await VectorStoreIndex.init({
storageContext: loadedStorageContext,
});
Why good: Named constant for path, separate create vs load paths, storage context reuse
// BAD: Rebuilding index on every request
async function handleQuery(question: string) {
const docs = await new SimpleDirectoryReader().loadData({
directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(docs); // Expensive!
const engine = index.asQueryEngine();
return engine.query({ query: question });
}
Why bad: Re-indexes all documents on every call, wastes time and API credits on re-embedding
See: examples/core.md for load-or-create pattern
Create agents that use tools defined with Zod schemas. Use agent() from @llamaindex/workflow.
import { tool, Settings } from "llamaindex";
import { agent, agentStreamEvent } from "@llamaindex/workflow";
import { openai } from "@llamaindex/openai";
import { z } from "zod";
Settings.llm = openai({ model: "gpt-4o" });
const weatherTool = tool({
name: "getWeather",
description: "Get current weather for a city",
parameters: z.object({
city: z.string({ description: "City name" }),
}),
execute: async ({ city }) => {
// Your weather API call here
return { temperature: 22, condition: "sunny" };
},
});
const myAgent = agent({ tools: [weatherTool] });
const result = await myAgent.run("What's the weather in Paris?");
console.log(result.data);
Why good: Zod schema for type-safe parameters, description guides the LLM, async execute function
See: examples/agents.md for multi-agent workflows, QueryEngineTool, streaming agents
Build conversational interfaces over your indexed data with conversation memory.
import {
VectorStoreIndex,
ContextChatEngine,
SimpleDirectoryReader,
} from "llamaindex";
const documents = await new SimpleDirectoryReader().loadData({
directoryPath: "./data",
});
const index = await VectorStoreIndex.fromDocuments(documents);
const retriever = index.asRetriever({ similarityTopK: 3 });
const chatEngine = new ContextChatEngine({ retriever });
// Multi-turn conversation -- chat engine maintains history
const response1 = await chatEngine.chat({ message: "What is LlamaIndex?" });
console.log(response1.message.content);
const response2 = await chatEngine.chat({
message: "How does it handle streaming?",
});
console.log(response2.message.content);
Why good: Retriever-based context injection, automatic conversation history, multi-turn support
See: examples/chat-streaming.md for streaming chat, system prompts, chat history management
Stream responses for user-facing applications.
import { agentStreamEvent } from "@llamaindex/workflow";
// Agent streaming
const events = myAgent.runStream("Tell me about TypeScript");
for await (const event of events) {
if (agentStreamEvent.include(event)) {
process.stdout.write(event.data.delta);
}
}
// Query engine streaming
const response = await queryEngine.query({
query: "Summarize the document",
stream: true,
});
for await (const chunk of response) {
process.stdout.write(chunk.message.content);
}
Why good: Event-based agent streaming with typed filters, query engine streaming with for-await
See: examples/chat-streaming.md for response synthesizer streaming, chat engine streaming
Configure how documents are chunked before indexing.
import { SentenceSplitter, Settings } from "llamaindex";
const CHUNK_SIZE = 512;
const CHUNK_OVERLAP = 50;
// Set globally via Settings
Settings.nodeParser = new SentenceSplitter({
chunkSize: CHUNK_SIZE,
chunkOverlap: CHUNK_OVERLAP,
});
// Or use standalone
const splitter = new SentenceSplitter({ chunkSize: CHUNK_SIZE });
const texts = splitter.splitText("Your long document text here...");
Why good: Named constants for chunk parameters, global vs standalone usage shown, sentence-aware splitting
// BAD: Using default chunk size without considering document characteristics
const index = await VectorStoreIndex.fromDocuments(documents);
// Default chunk size may be too large for short Q&A or too small for long narratives
Why bad: Default chunk size (1024 tokens) may not suit your data, causes poor retrieval quality
See: examples/ingestion.md for MarkdownNodeParser, CodeSplitter, custom chunk strategies
</patterns><decision_framework>
What is your use case?
+-- Semantic search over documents -> VectorStoreIndex (most common)
+-- Summarization of all documents -> SummaryIndex
+-- Both search AND summarization -> Create both, use as separate tools in an agent
+-- Hierarchical document structure -> Use MarkdownNodeParser + VectorStoreIndex
How should users interact with your data?
+-- Single question, single answer -> Query Engine (index.asQueryEngine())
+-- Multi-turn conversation -> Chat Engine (ContextChatEngine)
+-- Multiple tools/indexes + reasoning -> Agent (agent() from @llamaindex/workflow)
+-- Complex multi-step workflow -> Multi-agent with handoffs
Which LLM provider are you using?
+-- OpenAI -> npm install @llamaindex/openai
+-- Anthropic -> npm install @llamaindex/anthropic
+-- Local (Ollama) -> npm install @llamaindex/ollama
+-- Groq -> npm install @llamaindex/groq
+-- Google Gemini -> npm install @llamaindex/gemini
What kind of documents are you indexing?
+-- Short Q&A pairs -> chunkSize: 256-512
+-- Technical documentation -> chunkSize: 512-1024
+-- Long narratives/reports -> chunkSize: 1024-2048
+-- Code files -> Use CodeSplitter (AST-aware)
+-- Markdown -> Use MarkdownNodeParser (structure-aware)
</decision_framework>
<red_flags>
High Priority Issues:
Settings.llm before indexing/querying -- defaults to OpenAI, fails silently without API keyawait async operations -- fromDocuments(), query(), chat() all return PromisesstorageContextFromDefaultsllamaindex without provider packages (@llamaindex/openai, etc.)Medium Priority Issues:
similarityTopK on retrievers -- default may return too few or too many resultssourceNodes -- they contain the retrieved context for debugging and citationsSimpleDirectoryReader per request instead of caching the loaded documentsresponse.message.content might be empty on retrieval failureCommon Mistakes:
asQueryEngine() (single question) with ContextChatEngine (multi-turn conversation)VectorStoreIndex.fromDocuments() when you should use VectorStoreIndex.init() to load from storageopenai from llamaindex instead of @llamaindex/openai -- the llamaindex package may re-export some things but provider-specific imports are more reliablemessages array to query() -- query engines take { query: string }, not a messages arrayindex.asQueryEngine() multiple times instead of storing the engine referenceGotchas & Edge Cases:
Settings is a global singleton -- setting it in one module affects all others. Override locally by passing llm directly to constructors when you need different models for different operations.SimpleDirectoryReader only works on Node.js -- it uses fs internally. For edge/serverless, load documents differently or use LlamaParse.storageContextFromDefaults creates four JSON files in the persist directory (docstore.json, graph_store.json, index_store.json, vector_store.json). If any are corrupted, delete the directory and re-index.ReadableStream, WritableStream), so add "DOM.AsyncIterable" to tsconfig.json lib if you get type errors.tsconfig.json must use "moduleResolution": "bundler" or "nodenext" -- the classic "node" resolution will fail to resolve LlamaIndex sub-packages.gpt-tokenizer for 60x faster tokenization.SentenceSplitter chunk size is in tokens, not characters. A 512-token chunk is roughly 2000 characters.llamaindex package is large (~2MB+). For production, consider importing specific sub-packages to reduce bundle size.VectorStoreIndex.fromDocuments() makes embedding API calls for every chunk. For large document sets, this can be expensive. Monitor costs.</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST configure Settings.llm and Settings.embedModel before any indexing or querying -- the Settings singleton is lazily initialized and defaults to OpenAI, which will fail without an API key)
(You MUST await all LlamaIndex operations -- fromDocuments(), asQueryEngine(), query(), chat(), loadData() are ALL async)
(You MUST install provider packages separately -- @llamaindex/openai, @llamaindex/ollama, @llamaindex/anthropic are NOT included in the base llamaindex package)
(You MUST use storageContextFromDefaults({ persistDir }) to persist indexes -- without persistence, indexes are rebuilt from scratch on every restart)
(You MUST never hardcode API keys -- use environment variables and dotenv/config)
Failure to follow these rules will produce broken RAG pipelines, wasted embedding API credits, or cryptic runtime errors.
</critical_reminders>
development
Material Design component library for Vue 3
development
VitePress 1.x — Vue-powered static site generator for documentation sites, built on Vite
tools
Docusaurus 3.x documentation framework — site configuration, docs/blog plugins, sidebars, versioning, MDX, swizzling, and deployment
development
TanStack Form patterns - useForm, form.Field, validators, arrays, linked fields, createFormHook, type safety