skills/redis-vector-search/SKILL.md
Redis vector search guidance covering HNSW vs FLAT algorithm choice, vector index configuration (dims, distance metric, datatype), filtered hybrid search combining vector similarity with TAG or NUMERIC filters, and the RAG retrieval pattern with RedisVL. Use when defining a VECTOR field in FT.CREATE, integrating embeddings (OpenAI, Cohere, sentence-transformers), tuning HNSW parameters (M, EF_CONSTRUCTION, EF_RUNTIME), building a retrieval-augmented generation pipeline, or filtering vector results by attribute.
npx skillsauth add redis/agent-skills redis-vector-searchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guidance for storing and searching embeddings in Redis. Covers index configuration, algorithm selection, hybrid filtering, and the RAG retrieval pattern with RedisVL.
VECTOR field in FT.CREATE (raw RQE) or a RedisVL IndexSchema.This skill builds on the redis-query-engine skill — vector fields live inside RQE indexes and share the same FT.CREATE / FT.SEARCH machinery.
Three settings must match the embedding model:
DIM — the model's output dimensionality (e.g. 1536 for OpenAI text-embedding-3-small). A mismatch produces silent garbage.DISTANCE_METRIC — COSINE for normalized text embeddings (the common case), IP for unnormalized inner-product, L2 for raw Euclidean.TYPE / datatype — usually FLOAT32. Use FLOAT16 or quantized variants only when memory cost is a hard constraint.Raw RQE:
FT.CREATE idx:docs ON HASH PREFIX 1 doc:
SCHEMA
content TEXT
embedding VECTOR HNSW 6
TYPE FLOAT32
DIM 1536
DISTANCE_METRIC COSINE
RedisVL:
schema = IndexSchema.from_dict({
"index": {"name": "idx:docs", "prefix": "doc:"},
"fields": [
{"name": "content", "type": "text"},
{"name": "embedding", "type": "vector", "attrs": {
"dims": 1536, "algorithm": "HNSW",
"datatype": "FLOAT32", "distance_metric": "COSINE",
}},
]
})
See references/index-creation.md for redis-py and RedisVL variants.
| Algorithm | Speed | Accuracy | Memory | Best for | |---|---|---|---|---| | HNSW | Fast (approximate) | ~95%+ recall (tunable) | Higher | Large datasets (>10k vectors), latency-sensitive | | FLAT | Slow (exact) | 100% | Lower | Small datasets (<10k), accuracy-critical |
Default to HNSW for any production-scale workload. Tuning levers:
M — connections per node (16–64). Higher = better recall, more memory.EF_CONSTRUCTION — build-time graph quality (100–500). Higher = better index, slower build.EF_RUNTIME — query-time candidate-list size. Higher = better recall, slower queries.Use FLAT when the corpus is small and you need exact results (e.g. semantic dedup over a few thousand items).
See references/algorithm-choice.md.
Apply attribute filters (TAG / NUMERIC) so the engine narrows the search space before the vector comparison. Don't fetch a wide result set and then filter client-side — that's slower and less accurate.
from redisvl.query import VectorQuery
from redisvl.query.filter import Num, Tag
filters = (Tag("category") == "technology") & (Num("date") >= 2024)
query = VectorQuery(
vector=query_embedding,
vector_field_name="embedding",
return_fields=["content", "category", "date"],
num_results=10,
filter_expression=filters,
)
results = index.query(query)
For text + vector fusion (BM25-weighted text scoring combined with vector similarity), use HybridQuery on Redis ≥ 8.4 with redis-py ≥ 7.1, or AggregateHybridQuery on older Redis. That's a different "hybrid" from filtered vector search above.
See references/hybrid-search.md.
Standard pipeline: embed the user query → vector search Redis → pass top-K context to the LLM.
# Index documents with embeddings
records = [{"content": doc.content,
"embedding": embed_model.encode(doc.content).tolist(),
"source": doc.source}
for doc in documents]
index.load(records)
# Retrieve relevant context for a user question
q_emb = embed_model.encode(user_question)
results = index.query(VectorQuery(
vector=q_emb,
vector_field_name="embedding",
return_fields=["content", "source"],
num_results=5,
))
# Generate with retrieved context
context = "\n".join(r["content"] for r in results)
response = llm.generate(f"Context: {context}\n\nQuestion: {user_question}")
Practical tips:
COSINE.index.load([...]) instead of one call per record.See references/rag-pattern.md.
development
Redis LangCache guidance for semantic caching of LLM responses on Redis Cloud — calling search/set via the SDK or REST API, tuning the similarity threshold, separating caches per task type, and filtering with custom attributes. Use when caching LLM completions or RAG answers to cut API cost and latency, building a cache-aside layer in front of OpenAI / Anthropic / etc., tuning hit rate vs precision, or splitting one app's LLM workloads into multiple LangCache caches.
testing
Redis security guidance covering authentication (requirepass and ACL users), TLS, ACL-based least-privilege access control, restricting network exposure via bind and protected-mode, firewall rules, and disabling dangerous commands. Use when deploying Redis to production, defining ACL users for an application, configuring TLS connections, locking down a Redis instance behind a firewall, or auditing a Redis deployment for security hardening.
testing
Redis Query Engine (RQE) guidance covering FT.CREATE schema design, field type selection (TEXT, TAG, NUMERIC, GEO, GEOSHAPE, VECTOR), DIALECT 2 query syntax, efficient FT.SEARCH and FT.AGGREGATE queries, zero-downtime index updates via aliases, and the SKIPINITIALSCAN option. Use when defining a search index on Hash or JSON documents, picking between TEXT and TAG for filtering, writing FT.SEARCH queries with filters and SORTBY, managing or swapping indexes in production, or troubleshooting slow searches with FT.PROFILE.
tools
Redis observability guidance — which metrics to monitor (memory, connections, hit ratio, ops/sec, rejected connections), which built-in commands to reach for during incident triage (SLOWLOG, INFO, MEMORY DOCTOR, CLIENT LIST, FT.PROFILE), and when to use the Redis Insight GUI. Use when setting up monitoring or alerts for a Redis instance, diagnosing a performance regression, profiling a slow FT.SEARCH query, or wiring Redis metrics into Prometheus, Datadog, or similar.