.claude/skills/ts-cloudflare-vectorize/SKILL.md
Serverless vector database at the edge with Cloudflare Vectorize. Use when: building semantic search on Cloudflare Workers, RAG pipelines at the edge, low-latency vector similarity search, or storing and querying embeddings without managing a separate vector database.
npx skillsauth add eliferjunior/Claude cloudflare-vectorizeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Cloudflare Vectorize is a globally distributed vector database built into the Cloudflare Workers platform. It stores high-dimensional vectors (embeddings) and supports fast approximate nearest-neighbor search — all at the edge, with no separate infrastructure to manage.
Key features:
Use Wrangler CLI to create an index. Specify the embedding dimensions and distance metric:
# For BAAI/bge-base-en-v1.5 (768 dims, cosine similarity)
npx wrangler vectorize create my-index \
--dimensions=768 \
--metric=cosine
# For OpenAI text-embedding-3-small (1536 dims)
npx wrangler vectorize create my-index \
--dimensions=1536 \
--metric=cosine
# Euclidean and dot-product are also supported
npx wrangler vectorize create my-index \
--dimensions=384 \
--metric=euclidean
wrangler.tomlname = "my-worker"
main = "src/index.ts"
compatibility_date = "2024-09-23"
[[vectorize]]
binding = "VECTORIZE_INDEX"
index_name = "my-index"
export interface Env {
VECTORIZE_INDEX: VectorizeIndex
}
Each vector needs a unique string id and a values array matching the index dimensions:
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const vectors: VectorizeVector[] = [
{
id: "doc-001",
values: [0.1, 0.2, 0.3, /* ... 768 total */],
metadata: { title: "Introduction to Cloudflare", url: "/docs/intro" },
},
{
id: "doc-002",
values: [0.4, 0.5, 0.6, /* ... */],
metadata: { title: "Workers AI Overview", url: "/docs/workers-ai" },
},
]
const result = await env.VECTORIZE_INDEX.insert(vectors)
// result.count = number of vectors inserted
return Response.json({ inserted: result.count })
},
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { queryVector, topK = 5 } = await request.json() as {
queryVector: number[]
topK?: number
}
const results = await env.VECTORIZE_INDEX.query(queryVector, {
topK,
returnMetadata: true, // include metadata in results
returnValues: false, // skip returning raw vector values
})
// results.matches is sorted by score (highest = most similar)
return Response.json({
matches: results.matches.map(m => ({
id: m.id,
score: m.score,
metadata: m.metadata,
}))
})
},
}
Filter results to a subset before computing similarity — useful for multi-tenant or categorized data:
const results = await env.VECTORIZE_INDEX.query(queryVector, {
topK: 10,
returnMetadata: true,
filter: {
category: { $eq: "documentation" },
},
})
// Compound filter
const filtered = await env.VECTORIZE_INDEX.query(queryVector, {
topK: 5,
returnMetadata: true,
filter: {
language: { $eq: "en" },
published: { $eq: true },
},
})
Supported filter operators: $eq, $ne, $lt, $lte, $gt, $gte, $in
Use namespaces to isolate data for different tenants or categories within a single index:
// Insert with namespace
await env.VECTORIZE_INDEX.insert([{
id: "tenant-a-doc-1",
values: embedding,
metadata: { text: "Document content..." },
namespace: "tenant-a",
}])
// Query within a namespace
const results = await env.VECTORIZE_INDEX.query(queryVector, {
topK: 5,
returnMetadata: true,
namespace: "tenant-a",
})
// Get vectors by ID
const vectors = await env.VECTORIZE_INDEX.getByIds(["doc-001", "doc-002"])
// Upsert (insert or update)
await env.VECTORIZE_INDEX.upsert([{
id: "doc-001",
values: newEmbedding,
metadata: { updated: true },
}])
// Delete by ID
await env.VECTORIZE_INDEX.deleteByIds(["doc-001", "doc-002"])
Complete RAG pipeline — embed query, search Vectorize, generate answer with LLM:
export interface Env {
AI: Ai
VECTORIZE_INDEX: VectorizeIndex
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { question } = await request.json() as { question: string }
// 1. Embed the user's question
const embeddingResult = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: [question],
})
const queryVector = embeddingResult.data[0]
// 2. Find relevant documents
const searchResults = await env.VECTORIZE_INDEX.query(queryVector, {
topK: 3,
returnMetadata: true,
})
const context = searchResults.matches
.map(m => m.metadata?.text as string)
.filter(Boolean)
.join("\n\n")
// 3. Generate answer with context
const answer = await env.AI.run("@cf/meta/llama-3-8b-instruct", {
messages: [
{
role: "system",
content: `Answer the question using only the provided context.\n\nContext:\n${context}`,
},
{ role: "user", content: question },
],
max_tokens: 512,
})
return Response.json({
answer: answer.response,
sources: searchResults.matches.map(m => ({
id: m.id,
score: m.score,
url: m.metadata?.url,
})),
})
},
}
For indexing large document collections, batch inserts for efficiency:
async function indexDocuments(
documents: Array<{ id: string; text: string; metadata: Record<string, unknown> }>,
env: Env,
batchSize = 100
) {
for (let i = 0; i < documents.length; i += batchSize) {
const batch = documents.slice(i, i + batchSize)
// Embed batch
const embeddingResult = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: batch.map(d => d.text),
})
// Prepare vectors
const vectors: VectorizeVector[] = batch.map((doc, idx) => ({
id: doc.id,
values: embeddingResult.data[idx],
metadata: { ...doc.metadata, text: doc.text },
}))
// Insert batch
await env.VECTORIZE_INDEX.insert(vectors)
console.log(`Indexed ${i + batch.length}/${documents.length} documents`)
}
}
# List all indexes
npx wrangler vectorize list
# Describe an index (dimensions, metric, vector count)
npx wrangler vectorize info my-index
# Delete an index
npx wrangler vectorize delete my-index
# Get vectors by ID (for debugging)
npx wrangler vectorize get-vectors my-index --ids=doc-001,doc-002
cosine distance for normalized text embeddings (BAAI, OpenAI); use euclidean or dot-product only when your model specifically recommends it.metadata so you can return it with search results without a separate database lookup.insert() call — batch larger datasets.development
Expert guidance for Fireworks AI, the platform for running open-source LLMs (Llama, Mixtral, Qwen, etc.) with enterprise-grade speed and reliability. Helps developers integrate Fireworks' inference API, fine-tune models, and deploy custom model endpoints with function calling and structured output support.
development
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
tools
Expert guidance for Firebase, Google's platform for building and scaling web and mobile applications. Helps developers set up authentication, Firestore/Realtime Database, Cloud Functions, hosting, storage, and analytics using Firebase's SDK and CLI.
development
When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.