skills/qdrant-performance-optimization/search-speed-optimization/SKILL.md
Diagnoses and fixes slow Qdrant search. Use when someone reports 'search is slow', 'high latency', 'queries take too long', 'low QPS', 'throughput too low', 'filtered search is slow', or 'search was fast but now it's slow'. Also use when search performance degrades after config changes or data growth.
npx skillsauth add williamlimasilva/.copilot qdrant-search-speed-optimizationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
There the multiple possible reasons for search performance degradation. The most common ones are:
hnsw_ef, complex filters without payload index)Use when: individual queries take too long regardless of load.
with_payload: false and with_vectors: false to see if payload retrieval is the bottleneckUse when: system can't serve enough queries per second under load.
default_segment_number to 2) Maximizing throughputUse when: filtered search is significantly slower than unfiltered. Most common SA complaint after memory.
is_tenant=true for primary filtering condition: Tenant indexnested filtering conditions as a primary filter. It might force qdrant to read raw payload values instead of using index.indexed_only=true parameter, if the query is significantly faster, it means that the optimizer is still running and has not yet indexed all segments.optimizer_cpu_budget to reserve more CPU for queriesprevent_unoptimized=true to prevent creating segments with a large amount of unindexed data for searches. Instead, once a segment reaches the so called indexing_threshold, all additional points will be added in ‘deferred state’.Learn more here
always_ram=false on quantization (disk thrashing on every search)development
Build production RAG pipelines and persistent agent memory using Pinecone as the vector database backend. ALWAYS USE THIS SKILL when the user mentions Pinecone, wants to index documents for semantic search, build a retrieval-augmented generation system, store agent memory across sessions, implement hybrid search, or connect an LLM to a searchable knowledge base — even if they don't say "Pinecone" explicitly. Also use when the user asks about vector databases for RAG, namespace isolation for multi-tenant agents, embedding pipelines, or scaling a knowledge base beyond what local storage can handle. DO NOT use for local-only vector stores (Chroma, FAISS, pgvector) or pure keyword search with no semantic component.
development
Perform an AWS Well-Architected Framework review of the current workload IaC and architecture, generating findings and GitHub issues for improvements.
devops
Query AWS resources using natural language. Covers EC2, S3, RDS, Lambda, ECS, EKS, Secrets Manager, IAM, VPC, networking, messaging, and more. Strictly read-only — no writes, deletes, or mutations.
devops
Analyze AWS resource health, diagnose issues from CloudWatch logs and metrics, and create a remediation plan for identified problems.