Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

chicagopeabodydev-sudo/llama-index-skill

Name: llama-index-skill
Author: chicagopeabodydev-sudo

.cursor/skills/llama-index-skill/SKILL.md

npx skillsauth add chicagopeabodydev-sudo/library_bot_poc llama-index-skill

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

When to use this skill

Use this skill to take custom data in various formats, such as .pdf or .md (markdown), and convert it into a format that can be consumed by LLMs when answering questions. This process is known as retrieval augmented generation (RAG).

Key steps when using LlamaIndex for RAG

1. LOADING data is ingesting supplied data, such as markdown files, and processing them into "documents" and "nodes".
1. INDEXING then processes the documents and nodes using an embedding model to generate an "index".
1. STORING the index (embedding results), typically in a vector database.
1. QUERYING an LLM and adding to this related data pulled from the index.

Loading data is completed with a data connector, also called a "Reader".

Result of loading is a "document" which is a container around any data source - for instance, a pdf file, an API output, or retrieve data from a database.

Nodes are atomic units of data in LlamaIndex and represent “chunks” of a source document.

Node parsers are a simple abstraction that take a list of documents and chunk them into node objects, such that each node is a specific chunk of the parent document.

Example syntax using the markdown parser to get nodes:

from llama_index.core.node_parser import MarkdownNodeParser

parser = MarkdownNodeParser()
nodes = parser.get_nodes_from_documents(markdown_docs)

Indexing means creating a data structure that allows for semantic-querying of the data later.

For LLMs this nearly always means creating vector embeddings using an embedding model.

Indexes can be created from nodes or from documents (by using "from_documents()").

Example to create an index from a document:

index = VectorStoreIndex.from_documents(documents)

Example to create an index from nodes:

index = VectorStoreIndex(nodes)

Storing an index typically uses a Vector Store.

LlamaIndex supports many types of Vector Stores including pgvector on Supabase or chomadb.

Example syntax (assumes the "vector_store" and "storage_context" already exist):

# step 1 - load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store, storage_context=storage_context
)

# step 2 - create a query engine
query_engine = index.as_query_engine()
response = query_engine.query("What is llama2?")

Querying is a prompt call to an LLM using a QueryEngine

Optionally QueryEngines can be assigned an array of QueryTools, with each tool designed for a different type of question.

similarity_top_k defines the number of top-scoring relevant document chunks (nodes) retrieved from an index to answer a query, and it determines how many semantic matches (using cosine similarity) are passed to the LLM.

- Typically set to 2–10
- Small similarity_top_k (1-3) is good for direct, specific, or fact-based queries.
- Large similarity_top_k (5-10+) is better for complex, summary-based, or open-ended questions.
    - Higher values may improve context recall, but increase costs, latency, and increases odds of inaccurate results.

3 steps of querying:

- 1. Retrieval - when you find and return the most relevant documents for your query from your Index. The most common type of retrieval is “top-k” semantic retrieval.
- 2. Postprocessing (optional) - when the Nodes retrieved are reranked, transformed, or filtered, for example requiring that nodes have metadata matching certain keywords.
- 3. Response Synthesis - when your query, your most-relevant data, and your prompt are combined and sent to your LLM to get a response.

- Example creating a retriever:

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

After a retriever fetches relevant nodes, a BaseSynthesizer synthesizes the final response by combining the information.

To configure use the "response_mode" property of the retriever

Options are:

| Mode | Description | When to Use | |-----|-------------|-------------| | default | Sequentially creates and refines an answer by iterating through each retrieved Node. A separate LLM call is made for every Node. | Best when accuracy and depth are more important than speed or cost. | | compact | Attempts to pack as many Node text chunks as possible into each prompt sent to the LLM. | Useful when there are many small chunks and you want to reduce the number of LLM calls. | | tree_summarize | Builds a hierarchical summary tree from the Nodes and returns the final summary from the root of the tree. | Ideal for summarizing large sets of documents or long collections of text chunks. | | no_text | Runs the retriever only, without sending the Node content to the LLM for generation. | Useful for debugging, inspection, or retrieval evaluation. | | accumulate | Applies the query to each Node independently, accumulating all responses into an array and returning them as a concatenated result. | Helpful when each chunk should be processed individually rather than merged into a single synthesized answer. |

Query Engine compared to Chat Engine

Chat engine is a high-level interface for having a conversation with your data (multiple back-and-forth instead of a single question & answer). Chat engine is a stateful analogy of a Query Engine. By keeping track of the conversation history, it can answer questions with past context in mind.

Additional Resources

For usage examples, see examples.md
LlamaIndex documentation
Loading data documentation Loading-Documents
Indexing documentation Indexing-Documents
Storing embedding results documentation Storing-Indexed-Data
Querying LLMs Querying
Chat Engine Chat-engine

chicagopeabodydev-sudo/llama-index-skill

.cursor/skills/llama-index-skill/SKILL.md

Used to supplement the data LLMs can use when answering questions by supplying them with custom data generated and managed by llama-index.

data-ai

Updated Apr 17, 2026

$ install --global

skillsauth

npx skillsauth add chicagopeabodydev-sudo/library_bot_poc llama-index-skill

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:59 PM2.6s1 file scanned

SKILL.md

name:: llama-index-skill
description:: Used to supplement the data LLMs can use when answering questions by supplying them with custom data generated and managed by llama-index.

When to use this skill

Key steps when using LlamaIndex for RAG

1. LOADING data is ingesting supplied data, such as markdown files, and processing them into "documents" and "nodes".
1. INDEXING then processes the documents and nodes using an embedding model to generate an "index".
1. STORING the index (embedding results), typically in a vector database.
1. QUERYING an LLM and adding to this related data pulled from the index.

Loading data is completed with a data connector, also called a "Reader".

Result of loading is a "document" which is a container around any data source - for instance, a pdf file, an API output, or retrieve data from a database.

Nodes are atomic units of data in LlamaIndex and represent “chunks” of a source document.

Node parsers are a simple abstraction that take a list of documents and chunk them into node objects, such that each node is a specific chunk of the parent document.

Example syntax using the markdown parser to get nodes:

from llama_index.core.node_parser import MarkdownNodeParser

parser = MarkdownNodeParser()
nodes = parser.get_nodes_from_documents(markdown_docs)

Indexing means creating a data structure that allows for semantic-querying of the data later.

For LLMs this nearly always means creating vector embeddings using an embedding model.

Indexes can be created from nodes or from documents (by using "from_documents()").

Example to create an index from a document:

index = VectorStoreIndex.from_documents(documents)

Example to create an index from nodes:

index = VectorStoreIndex(nodes)

Storing an index typically uses a Vector Store.

LlamaIndex supports many types of Vector Stores including pgvector on Supabase or chomadb.

Example syntax (assumes the "vector_store" and "storage_context" already exist):

# step 1 - load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store, storage_context=storage_context
)

# step 2 - create a query engine
query_engine = index.as_query_engine()
response = query_engine.query("What is llama2?")

Querying is a prompt call to an LLM using a QueryEngine

Optionally QueryEngines can be assigned an array of QueryTools, with each tool designed for a different type of question.

similarity_top_k defines the number of top-scoring relevant document chunks (nodes) retrieved from an index to answer a query, and it determines how many semantic matches (using cosine similarity) are passed to the LLM.

- Typically set to 2–10
- Small similarity_top_k (1-3) is good for direct, specific, or fact-based queries.
- Large similarity_top_k (5-10+) is better for complex, summary-based, or open-ended questions.
    - Higher values may improve context recall, but increase costs, latency, and increases odds of inaccurate results.

3 steps of querying:

- 1. Retrieval - when you find and return the most relevant documents for your query from your Index. The most common type of retrieval is “top-k” semantic retrieval.
- 2. Postprocessing (optional) - when the Nodes retrieved are reranked, transformed, or filtered, for example requiring that nodes have metadata matching certain keywords.
- 3. Response Synthesis - when your query, your most-relevant data, and your prompt are combined and sent to your LLM to get a response.

- Example creating a retriever:

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

After a retriever fetches relevant nodes, a BaseSynthesizer synthesizes the final response by combining the information.

To configure use the "response_mode" property of the retriever

Options are:

Query Engine compared to Chat Engine

Additional Resources

For usage examples, see examples.md
LlamaIndex documentation
Loading data documentation Loading-Documents
Indexing documentation Indexing-Documents
Storing embedding results documentation Storing-Indexed-Data
Querying LLMs Querying
Chat Engine Chat-engine

Related Skills

chicagopeabodydev-sudo/supabase-skill

development

VerifiedTrustedCommunity

Connects to Supabase-hosted PostgreSQL and pgvector vector databases. Use when storing or querying vector embeddings, building RAG pipelines with Supabase, using vecs for similarity search, or when the user mentions Supabase, pgvector, or vector databases.

SKILL.mdUpdated Apr 17, 2026

chicagopeabodydev-sudo/supabase-skill

chicagopeabodydev-sudo/streamlit-skill

development

VerifiedTrustedCommunity

Builds simple user interfaces in Python. Use when creating chatbots, dashboards, or data apps with Streamlit—especially chat UIs with st.chat_message and st.chat_input, or when the user mentions Streamlit.

SKILL.mdUpdated Apr 17, 2026

chicagopeabodydev-sudo/streamlit-skill

chicagopeabodydev-sudo/pydantic-skill

data-ai

VerifiedTrustedCommunity

Defines structured data models with Pydantic BaseModel. Use when defining LLM outputs or inputs for RAG, LlamaIndex structured outputs, or when the user mentions Pydantic, structured output, or response schemas.

SKILL.mdUpdated Apr 17, 2026

chicagopeabodydev-sudo/pydantic-skill

chicagopeabodydev-sudo/nemo-guardrails-skill

tools

VerifiedTrustedCommunity

NeMo Guardrails is an open-source Python package for adding programmable guardrails around LLM calls. Use it to block unsafe, malicious, off-topic, or policy-violating user inputs, retrieved RAG content, tool usage, and model responses.

SKILL.mdUpdated Apr 17, 2026

chicagopeabodydev-sudo/nemo-guardrails-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/chicagopeabodydev-sudo/library_bot_poc.git

# Copy into Claude Code skills folder (global)
cp -r library_bot_poc/.cursor/skills/llama-index-skill ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

chicagopeabodydev-sudo/library_bot_poc

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT