.cursor/skills/llama-index-skill/SKILL.md
Used to supplement the data LLMs can use when answering questions by supplying them with custom data generated and managed by llama-index.
npx skillsauth add chicagopeabodydev-sudo/library_bot_poc llama-index-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill to take custom data in various formats, such as .pdf or .md (markdown), and convert it into a format that can be consumed by LLMs when answering questions. This process is known as retrieval augmented generation (RAG).
from llama_index.core.node_parser import MarkdownNodeParser
parser = MarkdownNodeParser()
nodes = parser.get_nodes_from_documents(markdown_docs)
index = VectorStoreIndex.from_documents(documents)
index = VectorStoreIndex(nodes)
# step 1 - load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
vector_store, storage_context=storage_context
)
# step 2 - create a query engine
query_engine = index.as_query_engine()
response = query_engine.query("What is llama2?")
- Typically set to 2–10
- Small similarity_top_k (1-3) is good for direct, specific, or fact-based queries.
- Large similarity_top_k (5-10+) is better for complex, summary-based, or open-ended questions.
- Higher values may improve context recall, but increase costs, latency, and increases odds of inaccurate results.
- 1. Retrieval - when you find and return the most relevant documents for your query from your Index. The most common type of retrieval is “top-k” semantic retrieval.
- 2. Postprocessing (optional) - when the Nodes retrieved are reranked, transformed, or filtered, for example requiring that nodes have metadata matching certain keywords.
- 3. Response Synthesis - when your query, your most-relevant data, and your prompt are combined and sent to your LLM to get a response.
- Example creating a retriever:
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=10,
)
| Mode | Description | When to Use | |-----|-------------|-------------| | default | Sequentially creates and refines an answer by iterating through each retrieved Node. A separate LLM call is made for every Node. | Best when accuracy and depth are more important than speed or cost. | | compact | Attempts to pack as many Node text chunks as possible into each prompt sent to the LLM. | Useful when there are many small chunks and you want to reduce the number of LLM calls. | | tree_summarize | Builds a hierarchical summary tree from the Nodes and returns the final summary from the root of the tree. | Ideal for summarizing large sets of documents or long collections of text chunks. | | no_text | Runs the retriever only, without sending the Node content to the LLM for generation. | Useful for debugging, inspection, or retrieval evaluation. | | accumulate | Applies the query to each Node independently, accumulating all responses into an array and returning them as a concatenated result. | Helpful when each chunk should be processed individually rather than merged into a single synthesized answer. |
Chat engine is a high-level interface for having a conversation with your data (multiple back-and-forth instead of a single question & answer). Chat engine is a stateful analogy of a Query Engine. By keeping track of the conversation history, it can answer questions with past context in mind.
development
Connects to Supabase-hosted PostgreSQL and pgvector vector databases. Use when storing or querying vector embeddings, building RAG pipelines with Supabase, using vecs for similarity search, or when the user mentions Supabase, pgvector, or vector databases.
development
Builds simple user interfaces in Python. Use when creating chatbots, dashboards, or data apps with Streamlit—especially chat UIs with st.chat_message and st.chat_input, or when the user mentions Streamlit.
data-ai
Defines structured data models with Pydantic BaseModel. Use when defining LLM outputs or inputs for RAG, LlamaIndex structured outputs, or when the user mentions Pydantic, structured output, or response schemas.
tools
NeMo Guardrails is an open-source Python package for adding programmable guardrails around LLM calls. Use it to block unsafe, malicious, off-topic, or policy-violating user inputs, retrieved RAG content, tool usage, and model responses.