dist/plugins/api-vector-db-chroma/skills/api-vector-db-chroma/SKILL.md
Chroma vector database -- collection management, automatic embedding, metadata filtering, document storage, query patterns
npx skillsauth add agents-inc/skills api-vector-db-chromaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Quick Guide: Use
chromadb(v3.x) with@chroma-core/default-embedfor automatic embedding. Chroma auto-embeds documents if no embeddings are provided -- just passdocumentsandidstocollection.add(). Usewherefor metadata filtering andwhereDocumentfor document content filtering ($contains,$regex). Default distance metric isl2(Euclidean); usecosinefor most embedding models viaconfiguration: { hnsw: { space: "cosine" } }. Query results return nested arrays (ids: string[][]) because queries are batched -- always accessresults.ids[0]for a single query. Include only the fields you need via theincludeparameter to reduce payload size.
<critical_requirements>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST install @chroma-core/default-embed alongside chromadb -- the default embedding function ships as a separate package since v3)
(You MUST access query results as nested arrays -- results.ids[0], results.documents[0] -- because Chroma batches queries and returns string[][] not string[])
(You MUST use the configuration parameter for HNSW settings -- the legacy metadata: { "hnsw:space": "cosine" } approach is deprecated)
(You MUST use flat metadata values only (string, number, boolean, typed arrays) -- nested objects are not supported and will be rejected)
</critical_requirements>
Additional resources:
Auto-detection: Chroma, chromadb, ChromaClient, CloudClient, createCollection, getOrCreateCollection, collection.add, collection.query, collection.get, collection.upsert, queryTexts, queryEmbeddings, nResults, whereDocument, $contains, @chroma-core/default-embed, @chroma-core/openai, EmbeddingFunction, vector database, semantic search, embedding, RAG retrieval, hnsw:space
When to use:
$contains and $regexKey patterns covered:
where) with comparison, set, array, and logical operatorswhereDocument) with $contains and $regexWhen NOT to use:
Chroma is a lightweight, developer-friendly embedding database designed for rapid prototyping and production RAG applications. The core principle: pass documents in, get relevant results out -- Chroma handles embedding automatically.
Core principles:
where for structured metadata filters and whereDocument for full-text content filters. Both can be combined in a single query.all-MiniLM-L6-v2 via @chroma-core/default-embed) works out of the box for English text. Swap to OpenAI, Cohere, or any provider with a single package change.string[][]), even for single queries. Always access [0] for the first query's results.Always pass the server URL explicitly from an environment variable -- never rely on the implicit http://localhost:8000 default. See examples/core.md for HTTP, Cloud, and token-authenticated client examples.
const chromaUrl = process.env.CHROMA_URL;
if (!chromaUrl) throw new Error("CHROMA_URL environment variable is required");
return new ChromaClient({ path: chromaUrl });
Use the configuration parameter for HNSW settings -- never the deprecated metadata: { "hnsw:space": "cosine" } approach. See examples/core.md.
const collection = await client.createCollection({
name: COLLECTION_NAME,
configuration: { hnsw: { space: "cosine" } },
});
Pass documents and ids -- Chroma embeds automatically. Either documents or embeddings must be provided; metadata alone is insufficient. See examples/core.md.
await collection.add({
ids: articles.map((a) => a.id),
documents: articles.map((a) => a.text),
metadatas: articles.map((a) => ({ category: a.category })),
});
Combine where (metadata) and whereDocument (content) filters. Results are nested arrays -- always access [0] for single-query results. See examples/metadata-filtering.md for all operators.
const results = await collection.query({
queryTexts: ["machine learning fundamentals"],
nResults: N_RESULTS,
where: {
$and: [{ category: { $eq: "tutorial" } }, { year: { $gte: 2023 } }],
},
whereDocument: { $contains: "neural network" },
include: ["documents", "metadatas", "distances"],
});
// results.ids[0] -- nested array, access [0] for first query
Use get() with limit/offset for non-similarity retrieval. Unlike query(), get() returns flat arrays. See examples/core.md.
const results = await collection.get({
where: { category: { $eq: category } },
limit: PAGE_SIZE,
offset,
include: ["documents", "metadatas"],
});
Use upsert() instead of add() for create-or-update semantics -- safer for pipelines that run multiple times. See examples/core.md.
await collection.upsert({
ids: docs.map((d) => d.id),
documents: docs.map((d) => d.text),
metadatas: docs.map((d) => d.metadata),
});
</patterns>
<decision_framework>
Which distance metric should I use?
|-- Using embeddings from a language model? -> cosine (normalized, most common)
|-- Need dot product similarity? -> ip (inner product)
|-- Comparing raw feature vectors? -> l2 (Euclidean, Chroma default)
'-- Unsure? -> cosine (safe default for most embedding models)
Which embedding function should I use?
|-- Quick prototyping, English text? -> @chroma-core/default-embed (all-MiniLM-L6-v2, runs locally)
|-- Need high-quality embeddings? -> @chroma-core/openai (text-embedding-3-small)
|-- Have your own embedding pipeline? -> Pass embeddings directly (skip embedding function)
|-- Need custom model? -> Implement EmbeddingFunction interface
'-- Want all providers? -> npm install @chroma-core/all
How should I filter results?
|-- Structured attributes (category, year, status)? -> where (metadata filter)
|-- Full-text content search? -> whereDocument ($contains, $regex)
|-- Both? -> Combine where + whereDocument in same query
'-- Need exact match on specific IDs? -> get({ ids: [...] })
Which client should I use?
|-- Local development or self-hosted? -> ChromaClient({ path: "http://localhost:8000" })
|-- Chroma Cloud (managed)? -> CloudClient({ apiKey, tenant, database })
|-- Docker deployment? -> ChromaClient with Docker host URL
'-- Testing? -> ChromaClient against local Docker container
</decision_framework>
<red_flags>
High Priority Issues:
results.ids is string[][], not string[]; always use results.ids[0] for single-query results@chroma-core/default-embed package -- since v3, the default embedding function ships separately; npm install chromadb @chroma-core/default-embedmetadata: { "hnsw:space": "cosine" } for HNSW config -- use configuration: { hnsw: { space: "cosine" } } insteadMedium Priority Issues:
include in queries -- default includes vary (query returns documents, metadatas, distances; get returns documents, metadatas); explicitly set include for clarity and to control payload sizel2 (default) when cosine is appropriate -- most embedding models are normalized for cosine similarity; l2 may produce worse resultsadd() without documents or embeddings -- at least one must be provided; metadata alone is insufficientresults.ids[0] may be an empty array; check length before processingCommon Mistakes:
queryEmbeddings AND queryTexts together -- use one or the other, not bothupdate() to create missing records -- update() silently ignores non-existent IDs; use upsert() for create-or-update semanticsdelete() with no arguments -- deletes nothing (not everything); pass ids or where to target specific records$gt/$lt on string metadata -- comparison operators only work on numeric values (int or float)Gotchas & Edge Cases:
space, ef_construction, max_neighbors) cannot be changed after collection creation -- you must delete and recreate the collectioncollection.count() returns total records in the collection, not filtered counts -- there is no filtered count APIpeek() returns the first limit items (default 10) in insertion order, not by relevance -- useful for debugging, not querying$contains in whereDocument is case-sensitive -- searching for "Neural" will not match "neural"$regex in whereDocument uses full regex syntax but can be slow on large collectionsstring[], number[]) must be homogeneous -- mixing types within an array is rejectedCategory and category are different fieldsnResults default is 10 if not specified in query()</red_flags>
<critical_reminders>
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST install @chroma-core/default-embed alongside chromadb -- the default embedding function ships as a separate package since v3)
(You MUST access query results as nested arrays -- results.ids[0], results.documents[0] -- because Chroma batches queries and returns string[][] not string[])
(You MUST use the configuration parameter for HNSW settings -- the legacy metadata: { "hnsw:space": "cosine" } approach is deprecated)
(You MUST use flat metadata values only (string, number, boolean, typed arrays) -- nested objects are not supported and will be rejected)
Failure to follow these rules will cause embedding failures, incorrect result access, deprecated configuration warnings, and rejected metadata.
</critical_reminders>
development
Material Design component library for Vue 3
development
VitePress 1.x — Vue-powered static site generator for documentation sites, built on Vite
tools
Docusaurus 3.x documentation framework — site configuration, docs/blog plugins, sidebars, versioning, MDX, swizzling, and deployment
development
TanStack Form patterns - useForm, form.Field, validators, arrays, linked fields, createFormHook, type safety