skills/redis-semantic-cache/SKILL.md
Redis LangCache guidance for semantic caching of LLM responses on Redis Cloud — calling search/set via the SDK or REST API, tuning the similarity threshold, separating caches per task type, and filtering with custom attributes. Use when caching LLM completions or RAG answers to cut API cost and latency, building a cache-aside layer in front of OpenAI / Anthropic / etc., tuning hit rate vs precision, or splitting one app's LLM workloads into multiple LangCache caches.
npx skillsauth add redis/agent-skills redis-semantic-cacheInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Semantic caching for LLM responses with Redis Cloud's LangCache service. Stores prompts as embeddings; subsequent semantically-similar prompts return the cached response without re-calling the model.
LangCache is currently in preview on Redis Cloud. Features and behavior may change.
LangCache fits in front of any LLM call as a standard cache-aside pattern:
search.set the response so future similar prompts hit.from langcache import LangCache
import os
lang_cache = LangCache(
server_url=f"https://{os.getenv('HOST')}",
cache_id=os.getenv("CACHE_ID"),
api_key=os.getenv("API_KEY"),
)
result = lang_cache.search(prompt="What is Redis?", similarity_threshold=0.9)
if result:
response = result[0]["response"]
else:
response = llm.generate("What is Redis?")
lang_cache.set(prompt="What is Redis?", response=response)
The same operations are available via REST (POST /v1/caches/{cacheId}/entries/search and POST /v1/caches/{cacheId}/entries) when an SDK isn't an option.
See references/langcache-usage.md for full SDK + REST samples and attribute-based storage.
The threshold controls how close (in embedding cosine distance) a new prompt must be to a cached one to count as a hit. Higher = stricter match, fewer false positives. Lower = more hits, more risk of returning an off-topic answer.
| Threshold | Behavior | Use when | |---|---|---| | 0.95+ | Near-exact match required | Customer-facing answers where wrong responses are costly | | 0.9 | Balanced default | Most workloads — start here | | 0.8 | Loose semantic match | Internal tools, exploratory queries, FAQ deduplication |
# Stricter — fewer false positives
result = lang_cache.search(prompt="What is Redis?", similarity_threshold=0.95)
# Looser — higher hit rate
result = lang_cache.search(prompt="What is Redis?", similarity_threshold=0.8)
Adjust by watching the actual cache-hit rate and spot-checking that returned answers are still relevant.
See references/best-practices.md.
Different LLM workloads should not share one cache — a "code question" prompt is semantically close to other code questions but has nothing to do with a password-reset support query, and crossing them returns garbage.
support_cache = LangCache(server_url=..., cache_id="support-cache-id", api_key=...)
code_cache = LangCache(server_url=..., cache_id="code-cache-id", api_key=...)
Create distinct cache IDs in Redis Cloud per task, and route each call to the right one. As a finer-grained alternative, store and search with custom attributes (e.g. {"category": "database"}) to keep tasks in the same cache but isolated by attribute filter — useful when the same prompt format spans subtopics.
development
Redis vector search guidance covering HNSW vs FLAT algorithm choice, vector index configuration (dims, distance metric, datatype), filtered hybrid search combining vector similarity with TAG or NUMERIC filters, and the RAG retrieval pattern with RedisVL. Use when defining a VECTOR field in FT.CREATE, integrating embeddings (OpenAI, Cohere, sentence-transformers), tuning HNSW parameters (M, EF_CONSTRUCTION, EF_RUNTIME), building a retrieval-augmented generation pipeline, or filtering vector results by attribute.
testing
Redis security guidance covering authentication (requirepass and ACL users), TLS, ACL-based least-privilege access control, restricting network exposure via bind and protected-mode, firewall rules, and disabling dangerous commands. Use when deploying Redis to production, defining ACL users for an application, configuring TLS connections, locking down a Redis instance behind a firewall, or auditing a Redis deployment for security hardening.
testing
Redis Query Engine (RQE) guidance covering FT.CREATE schema design, field type selection (TEXT, TAG, NUMERIC, GEO, GEOSHAPE, VECTOR), DIALECT 2 query syntax, efficient FT.SEARCH and FT.AGGREGATE queries, zero-downtime index updates via aliases, and the SKIPINITIALSCAN option. Use when defining a search index on Hash or JSON documents, picking between TEXT and TAG for filtering, writing FT.SEARCH queries with filters and SORTBY, managing or swapping indexes in production, or troubleshooting slow searches with FT.PROFILE.
tools
Redis observability guidance — which metrics to monitor (memory, connections, hit ratio, ops/sec, rejected connections), which built-in commands to reach for during incident triage (SLOWLOG, INFO, MEMORY DOCTOR, CLIENT LIST, FT.PROFILE), and when to use the Redis Insight GUI. Use when setting up monitoring or alerts for a Redis instance, diagnosing a performance regression, profiling a slow FT.SEARCH query, or wiring Redis metrics into Prometheus, Datadog, or similar.