skills/vector-search-designer/SKILL.md
Design vector similarity search systems for semantic retrieval at scale
npx skillsauth add jmsktm/claude-settings Vector Search DesignerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The Vector Search Designer skill helps you architect and implement vector similarity search systems that power semantic search, recommendation engines, and AI applications. It guides you through selecting the right vector database, designing index structures, optimizing query performance, and scaling to millions or billions of vectors.
Vector search has become foundational to modern AI systems, from RAG pipelines to product recommendations. This skill covers the full stack: understanding approximate nearest neighbor (ANN) algorithms, choosing between database options, tuning recall vs latency tradeoffs, and implementing production-ready search infrastructure.
Whether you are building on Pinecone, Weaviate, Qdrant, pgvector, or implementing your own solution, this skill ensures your vector search system meets your performance and accuracy requirements.
# HNSW example configuration
hnsw_config = {
"M": 16, # Connections per node (higher = better recall, more memory)
"efConstruction": 200, # Build-time search depth
"efSearch": 100, # Query-time search depth
}
# IVF example configuration
ivf_config = {
"nlist": 1024, # Number of clusters
"nprobe": 32, # Clusters to search at query time
}
| Action | Command/Trigger | |--------|-----------------| | Choose database | "Which vector database for [use case]" | | Design index | "Design vector index for [scale]" | | Optimize search | "Speed up vector search" | | Add filtering | "Add metadata filters to vector search" | | Scale vectors | "Scale to [N] million vectors" | | Benchmark search | "Benchmark vector search performance" |
Right-Size Your Database: Don't over-engineer for scale you don't need
Understand Recall vs Speed Tradeoff: ANN is approximate by design
Use Hybrid Search When Needed: Combine vector and keyword search
Design Metadata for Filtering: Plan your filter strategy upfront
Batch Operations When Possible: Reduce network overhead
Monitor and Alert: Production search needs observability
Handle documents with multiple representations:
class MultiVectorIndex:
def __init__(self):
self.title_index = VectorIndex(dim=768)
self.content_index = VectorIndex(dim=768)
self.summary_index = VectorIndex(dim=768)
def search(self, query_embedding, weights=None):
weights = weights or {"title": 0.3, "content": 0.5, "summary": 0.2}
results = {}
for field, weight in weights.items():
index = getattr(self, f"{field}_index")
field_results = index.search(query_embedding, k=20)
for doc_id, score in field_results:
results[doc_id] = results.get(doc_id, 0) + score * weight
return sorted(results.items(), key=lambda x: x[1], reverse=True)[:10]
Optimize search with filters:
def filtered_search(query_embedding, filters, k=10):
# Strategy 1: Pre-filter (for selective filters)
if estimate_selectivity(filters) < 0.1:
candidate_ids = apply_filters(filters)
return vector_search_subset(query_embedding, candidate_ids, k)
# Strategy 2: Post-filter (for non-selective filters)
elif estimate_selectivity(filters) > 0.5:
results = vector_search(query_embedding, k * 3)
filtered = [r for r in results if matches_filters(r, filters)]
return filtered[:k]
# Strategy 3: Hybrid (general case)
else:
return vector_search_with_filters(query_embedding, filters, k)
Reduce memory with acceptable accuracy loss:
# Product Quantization configuration
pq_config = {
"nbits": 8, # Bits per sub-quantizer
"m": 16, # Number of sub-quantizers
# 768-dim * 4 bytes = 3KB/vector -> 16 * 1 byte = 16 bytes/vector
}
# Binary quantization (extreme compression)
binary_config = {
"threshold": 0, # Values > 0 -> 1, else -> 0
# 768-dim * 4 bytes = 3KB/vector -> 768 bits = 96 bytes/vector
}
Handle dynamic data efficiently:
class DynamicVectorIndex:
def __init__(self, rebuild_threshold=10000):
self.main_index = build_optimized_index()
self.delta_index = [] # Recent additions
self.rebuild_threshold = rebuild_threshold
def add(self, vector, metadata):
self.delta_index.append((vector, metadata))
if len(self.delta_index) >= self.rebuild_threshold:
self.rebuild()
def search(self, query, k):
main_results = self.main_index.search(query, k)
delta_results = brute_force_search(self.delta_index, query, k)
return merge_results(main_results, delta_results, k)
def rebuild(self):
all_data = self.main_index.get_all() + self.delta_index
self.main_index = build_optimized_index(all_data)
self.delta_index = []
data-ai
Optimize YouTube videos for SEO, thumbnails, descriptions, and audience retention
testing
Design and facilitate effective workshops with agendas, activities, and outcomes
data-ai
Design and optimize AI-powered workflows for complex tasks
data-ai
Design and implement automated workflows to eliminate repetitive tasks and streamline processes