Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

curiositech/vector-database-migration-tool

Name: vector-database-migration-tool
Author: curiositech

skills/vector-database-migration-tool/SKILL.md

npx skillsauth add curiositech/windags-skills vector-database-migration-tool

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Vector Database Migration Tool

Migrate vector collections between Pinecone, Qdrant, Weaviate, and pgvector with schema mapping, optional re-embedding, and zero-downtime cutover strategies.

Activation Triggers

Activate on: "migrate vectors", "switch from Pinecone to Qdrant", "re-embed collection", "vector DB migration", "move embeddings to pgvector", "change embedding model", "vector schema migration"

NOT for: First-time document ingestion (rag-document-ingestion-pipeline), embedding model fine-tuning (ai-engineer), or vector search query optimization (ai-engineer)

Quick Start

Audit source — Export schema, index config, metadata fields, vector dimensions, and total record count from the source DB.
Map schema — Translate field names, data types, and index settings to the target DB's format. Handle dimension mismatches if re-embedding.
Choose strategy — Direct copy (same embedding model) or re-embed (new model). Re-embedding requires access to original text.
Run migration — Stream records in batches (1000-5000), transform, upsert to target. Checkpoint progress for resumability.
Validate and cutover — Compare record counts, run retrieval quality tests on both DBs, then switch traffic.

Core Capabilities

| Domain | Technologies | Notes | |--------|-------------|-------| | Source/Target DBs | Pinecone, Qdrant, Weaviate, pgvector, Milvus, Chroma | Any-to-any migration support | | Re-embedding | OpenAI, Cohere, BGE, Nomic | When switching embedding models | | Schema Mapping | Custom Python, Pydantic transforms | Field renaming, type coercion, metadata reshaping | | Orchestration | Python asyncio, Apache Airflow, Prefect | Batched streaming with checkpoints | | Validation | Recall@k comparison, cosine similarity checks | Before/after retrieval quality |

Architecture Patterns

Pattern 1: Direct Copy (Same Embedding Model)

Source DB ──→ [Stream Batches] ──→ [Transform Schema] ──→ [Upsert Target]
   │              │                       │                      │
   │          scroll/paginate        map fields,            batch upsert
   │          batch_size=2000        rename keys,           with retry
   │                                 coerce types
   └── Checkpoint: last_offset stored in Redis/file for resumability

# Direct migration: Qdrant → pgvector
import asyncio
from qdrant_client import QdrantClient

async def migrate_direct(source_url: str, pg_conn: str, collection: str):
    qdrant = QdrantClient(url=source_url)
    offset = load_checkpoint(collection)  # Resume support

    while True:
        records, next_offset = qdrant.scroll(
            collection, offset=offset, limit=2000, with_vectors=True
        )
        if not records:
            break

        rows = [(r.id, r.vector, json.dumps(r.payload)) for r in records]
        await pg_upsert_batch(pg_conn, rows)  # INSERT ... ON CONFLICT
        save_checkpoint(collection, next_offset)
        offset = next_offset

Pattern 2: Re-embedding Migration

Source DB ──→ [Extract Text + Metadata] ──→ [New Embedder] ──→ [Target DB]
   │                    │                         │                  │
   │              pull original text          batch embed        upsert with
   │              from payload/metadata       new dimensions     new vectors
   │
   └── CRITICAL: original text must be stored in source metadata
       If not available, extract from document store separately

Pattern 3: Zero-Downtime Cutover

Phase 1: Dual-write (new records go to both DBs)
Phase 2: Backfill (migrate historical data to target)
Phase 3: Shadow read (query both, compare results, log diffs)
Phase 4: Cutover (switch reads to target, stop writes to source)
Phase 5: Decommission (archive source after 7-day bake period)

Anti-Patterns

Big-bang migration without checkpointing — A failure at record 950,000 of 1,000,000 means restarting from zero. Always checkpoint batch offsets.
Skipping retrieval validation — Record counts match but recall@10 dropped 15%. Always run the same test queries against both DBs and compare.
Re-embedding without original text — If source metadata lacks the original text, you cannot re-embed. Verify text availability before planning a re-embed migration.
Ignoring dimension mismatches — Copying 1536-dim vectors into a 1024-dim index silently truncates or errors. Validate dimensions match, or plan for re-embedding.
No rollback plan — Keep the source DB running and queryable until the target is validated in production for at least one week.

Quality Checklist

[ ] Source audit completed: record count, dimensions, metadata schema, index type
[ ] Schema mapping documented and tested on sample batch
[ ] Migration is resumable via checkpointed offsets
[ ] Batch size tuned for target DB rate limits (typically 1000-5000)
[ ] Re-embedding path verified: original text accessible in source metadata
[ ] Record count matches between source and target after migration
[ ] Retrieval quality validated: recall@k on test queries within 2% of source
[ ] Dual-write or shadow-read phase used for zero-downtime cutover
[ ] Rollback plan documented: source DB retained for minimum 7 days post-cutover
[ ] Cost estimated: re-embedding API calls, target DB storage, migration compute time

curiositech/vector-database-migration-tool

skills/vector-database-migration-tool/SKILL.md

Migrate vector data between Pinecone, Qdrant, Weaviate, pgvector with re-embedding and schema mapping. Activate on: vector DB migration, switch vector database, re-embed collection, migrate embeddings. NOT for: initial ingestion (rag-document-ingestion-pipeline), embedding model training (ai-engineer).

tools

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add curiositech/windags-skills vector-database-migration-tool

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 4, 2026, 2:54 PM5.5s1 file scanned

SKILL.md

license:: Apache-2.0
name:: vector-database-migration-tool
description:: Migrate vector data between Pinecone, Qdrant, Weaviate, pgvector with re-embedding and schema mapping. Activate on: vector DB migration, switch vector database, re-embed collection, migrate embeddings. NOT for: initial ingestion (rag-document-ingestion-pipeline), embedding model training (ai-engineer).
allowed-tools:: Read,Write,Edit,Bash(python:*,pip:*,npm:*,npx:*)
category:: Data & Analytics
- skill:: data-pipeline-engineer
reason:: ETL orchestration for large-scale vector migrations

Vector Database Migration Tool

Migrate vector collections between Pinecone, Qdrant, Weaviate, and pgvector with schema mapping, optional re-embedding, and zero-downtime cutover strategies.

Activation Triggers

Activate on: "migrate vectors", "switch from Pinecone to Qdrant", "re-embed collection", "vector DB migration", "move embeddings to pgvector", "change embedding model", "vector schema migration"

NOT for: First-time document ingestion (rag-document-ingestion-pipeline), embedding model fine-tuning (ai-engineer), or vector search query optimization (ai-engineer)

Quick Start

Audit source — Export schema, index config, metadata fields, vector dimensions, and total record count from the source DB.
Map schema — Translate field names, data types, and index settings to the target DB's format. Handle dimension mismatches if re-embedding.
Choose strategy — Direct copy (same embedding model) or re-embed (new model). Re-embedding requires access to original text.
Run migration — Stream records in batches (1000-5000), transform, upsert to target. Checkpoint progress for resumability.
Validate and cutover — Compare record counts, run retrieval quality tests on both DBs, then switch traffic.

Core Capabilities

Architecture Patterns

Pattern 1: Direct Copy (Same Embedding Model)

Source DB ──→ [Stream Batches] ──→ [Transform Schema] ──→ [Upsert Target]
   │              │                       │                      │
   │          scroll/paginate        map fields,            batch upsert
   │          batch_size=2000        rename keys,           with retry
   │                                 coerce types
   └── Checkpoint: last_offset stored in Redis/file for resumability

# Direct migration: Qdrant → pgvector
import asyncio
from qdrant_client import QdrantClient

async def migrate_direct(source_url: str, pg_conn: str, collection: str):
    qdrant = QdrantClient(url=source_url)
    offset = load_checkpoint(collection)  # Resume support

    while True:
        records, next_offset = qdrant.scroll(
            collection, offset=offset, limit=2000, with_vectors=True
        )
        if not records:
            break

        rows = [(r.id, r.vector, json.dumps(r.payload)) for r in records]
        await pg_upsert_batch(pg_conn, rows)  # INSERT ... ON CONFLICT
        save_checkpoint(collection, next_offset)
        offset = next_offset

Pattern 2: Re-embedding Migration

Source DB ──→ [Extract Text + Metadata] ──→ [New Embedder] ──→ [Target DB]
   │                    │                         │                  │
   │              pull original text          batch embed        upsert with
   │              from payload/metadata       new dimensions     new vectors
   │
   └── CRITICAL: original text must be stored in source metadata
       If not available, extract from document store separately

Pattern 3: Zero-Downtime Cutover

Phase 1: Dual-write (new records go to both DBs)
Phase 2: Backfill (migrate historical data to target)
Phase 3: Shadow read (query both, compare results, log diffs)
Phase 4: Cutover (switch reads to target, stop writes to source)
Phase 5: Decommission (archive source after 7-day bake period)

Anti-Patterns

Big-bang migration without checkpointing — A failure at record 950,000 of 1,000,000 means restarting from zero. Always checkpoint batch offsets.
Skipping retrieval validation — Record counts match but recall@10 dropped 15%. Always run the same test queries against both DBs and compare.
Re-embedding without original text — If source metadata lacks the original text, you cannot re-embed. Verify text availability before planning a re-embed migration.
Ignoring dimension mismatches — Copying 1536-dim vectors into a 1024-dim index silently truncates or errors. Validate dimensions match, or plan for re-embedding.
No rollback plan — Keep the source DB running and queryable until the target is validated in production for at least one week.

Quality Checklist

[ ] Source audit completed: record count, dimensions, metadata schema, index type
[ ] Schema mapping documented and tested on sample batch
[ ] Migration is resumable via checkpointed offsets
[ ] Batch size tuned for target DB rate limits (typically 1000-5000)
[ ] Re-embedding path verified: original text accessible in source metadata
[ ] Record count matches between source and target after migration
[ ] Retrieval quality validated: recall@k on test queries within 2% of source
[ ] Dual-write or shadow-read phase used for zero-downtime cutover
[ ] Rollback plan documented: source DB retained for minimum 7 days post-cutover
[ ] Cost estimated: re-embedding API calls, target DB storage, migration compute time

Related Skills

curiositech/revisiting-interview-data-analysing-turn

data-ai

VerifiedTrustedCommunity

license: Apache-2.0 NOT for unrelated tasks outside this domain.

8SKILL.mdUpdated Jul 19, 2026

curiositech/revisiting-interview-data-analysing-turn

curiositech/redis-patterns-expert

development

VerifiedTrustedCommunity

Use when designing caching strategies (cache-aside, write-through, write-behind), implementing distributed locks, building rate limiters, leaderboards, real-time streams (XADD/consumer groups), pub/sub, or tuning eviction policies. Triggers: thundering-herd on cache miss, dogpile on key expiry, Redlock vs SET-NX-PX choice, sliding-window rate limiter, hot-key on a single cluster slot, big-key blowup, MULTI/EXEC across slots, KEYS in production. NOT for Redis Cluster operations/admin (different domain), embedded KV (SQLite, leveldb), in-process LRU caches, or Memcached.

8SKILL.mdUpdated Jul 19, 2026

curiositech/redis-patterns-expert

curiositech/react-server-components-boundary

tools

VerifiedTrustedCommunity

Drawing the `'use client'` boundary correctly in React Server Components apps (Next.js App Router, RSC frameworks) — leaf-pushing, slot composition, serialization rules, and environment poisoning prevention. Grounded in react.dev and Next.js 16 docs.

8SKILL.mdUpdated Jul 19, 2026

curiositech/react-server-components-boundary

curiositech/rate-limiting-strategy

development

VerifiedTrustedCommunity

Use when designing rate limiting for an API, choosing between token bucket / sliding window / leaky bucket / fixed window, implementing it in Redis, deciding edge (Cloudflare/Upstash) vs origin enforcement, sizing per-user vs per-IP vs per-endpoint quotas, returning the right 429 response with Retry-After, or fixing the boundary-burst bug in fixed-window limiters. Triggers: 429 too many requests, INCR + EXPIRE, ZADD + ZREMRANGEBYSCORE + ZCARD, X-RateLimit-Remaining header, Cloudflare WAF rate limiting rules, Upstash @upstash/ratelimit, leaky bucket shaping vs policing, distributed rate limiter consistency. NOT for DDoS mitigation specifically (different scale), CAPTCHA / bot management, full WAF design, or per-user quota billing.

8SKILL.mdUpdated Jul 19, 2026

curiositech/rate-limiting-strategy

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/curiositech/windags-skills.git

# Copy into Claude Code skills folder (global)
cp -r windags-skills/skills/vector-database-migration-tool ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

curiositech/windags-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT