skills/vector-database-migration-tool/SKILL.md
Migrate vector data between Pinecone, Qdrant, Weaviate, pgvector with re-embedding and schema mapping. Activate on: vector DB migration, switch vector database, re-embed collection, migrate embeddings. NOT for: initial ingestion (rag-document-ingestion-pipeline), embedding model training (ai-engineer).
npx skillsauth add curiositech/windags-skills vector-database-migration-toolInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Migrate vector collections between Pinecone, Qdrant, Weaviate, and pgvector with schema mapping, optional re-embedding, and zero-downtime cutover strategies.
Activate on: "migrate vectors", "switch from Pinecone to Qdrant", "re-embed collection", "vector DB migration", "move embeddings to pgvector", "change embedding model", "vector schema migration"
NOT for: First-time document ingestion (rag-document-ingestion-pipeline), embedding model fine-tuning (ai-engineer), or vector search query optimization (ai-engineer)
| Domain | Technologies | Notes | |--------|-------------|-------| | Source/Target DBs | Pinecone, Qdrant, Weaviate, pgvector, Milvus, Chroma | Any-to-any migration support | | Re-embedding | OpenAI, Cohere, BGE, Nomic | When switching embedding models | | Schema Mapping | Custom Python, Pydantic transforms | Field renaming, type coercion, metadata reshaping | | Orchestration | Python asyncio, Apache Airflow, Prefect | Batched streaming with checkpoints | | Validation | Recall@k comparison, cosine similarity checks | Before/after retrieval quality |
Source DB ──→ [Stream Batches] ──→ [Transform Schema] ──→ [Upsert Target]
│ │ │ │
│ scroll/paginate map fields, batch upsert
│ batch_size=2000 rename keys, with retry
│ coerce types
└── Checkpoint: last_offset stored in Redis/file for resumability
# Direct migration: Qdrant → pgvector
import asyncio
from qdrant_client import QdrantClient
async def migrate_direct(source_url: str, pg_conn: str, collection: str):
qdrant = QdrantClient(url=source_url)
offset = load_checkpoint(collection) # Resume support
while True:
records, next_offset = qdrant.scroll(
collection, offset=offset, limit=2000, with_vectors=True
)
if not records:
break
rows = [(r.id, r.vector, json.dumps(r.payload)) for r in records]
await pg_upsert_batch(pg_conn, rows) # INSERT ... ON CONFLICT
save_checkpoint(collection, next_offset)
offset = next_offset
Source DB ──→ [Extract Text + Metadata] ──→ [New Embedder] ──→ [Target DB]
│ │ │ │
│ pull original text batch embed upsert with
│ from payload/metadata new dimensions new vectors
│
└── CRITICAL: original text must be stored in source metadata
If not available, extract from document store separately
Phase 1: Dual-write (new records go to both DBs)
Phase 2: Backfill (migrate historical data to target)
Phase 3: Shadow read (query both, compare results, log diffs)
Phase 4: Cutover (switch reads to target, stop writes to source)
Phase 5: Decommission (archive source after 7-day bake period)
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.