skills/vectorize/SKILL.md
Manage codebase and database vectorization for semantic search. Use when initializing, refreshing, or querying the vector index. Triggers on: vectorize init, vectorize refresh, vectorize search, semantic search, vector index, enable vectorization.
npx skillsauth add mdmagnuson-creator/yo-go vectorizeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Manage codebase and database vectorization for semantic search capabilities.
Vectorization enables agents to query project knowledge semantically instead of relying solely on grep/glob. This skill provides CLI commands to initialize, refresh, and query the vector index.
OPENAI_API_KEY — For OpenAI embeddings (text-embedding-3-small)VOYAGE_API_KEY — For Voyage AI embeddings (voyage-code-3) ⭐ Recommended for codevectorize initInitialize vectorization for the current project.
# From project root
vectorize init
What it does:
vectorization section to project.json.vectorindex/ directory (gitignored)Output:
Initializing vectorization for my-project...
Detected stack: Next.js + TypeScript + Supabase
Found 1,247 source files
Configuration:
Embedding model: Voyage AI voyage-code-3
Contextual retrieval: enabled
Storage: local (.vectorindex/)
Database detected:
DATABASE_URL found in environment
Type: PostgreSQL (Supabase)
Include schema indexing? (y/n): y
Building index...
Chunking: 1,247 files → 8,453 chunks
Contextual: Adding descriptions (Claude Haiku)
Embedding: 8,453 chunks → vectors
[████████████████████] 100%
Installing git hooks...
post-commit hook installed
✅ Vectorization ready!
Index: 8,453 chunks (42MB)
Cost: $2.34 (one-time)
Next steps:
• Agents will automatically use semantic search
• Run 'vectorize search <query>' to test
• Run 'vectorize status' to check index health
vectorize refreshRebuild the vector index (full or incremental).
# Incremental refresh (only changed files)
vectorize refresh
# Full rebuild
vectorize refresh --full
When to use:
vectorize statusShow index statistics and health.
vectorize status
Output:
Vector Index Status: my-project
Index Location: .vectorindex/
Last Updated: 2026-02-28 10:30:45 (2 hours ago)
Index Age: OK (within 24h threshold)
Codebase:
Files indexed: 1,247
Chunks: 8,453
Languages: TypeScript (1,102), JavaScript (89), Markdown (56)
Database:
Schema: 23 tables, 187 columns
Config tables: pricing_tiers (10 rows), feature_flags (15 rows)
Storage:
Vector index: 38MB
BM25 index: 4MB
Total: 42MB
Configuration:
Embedding model: voyage (voyage-code-3)
Contextual retrieval: enabled
Hybrid weight: 0.7 (semantic)
Top-K: 20
vectorize search <query>Test semantic search from the command line.
vectorize search "How does user authentication work?"
Output:
Found 8 relevant chunks for "How does user authentication work?"
1. src/auth/middleware.ts (lines 45-89) [score: 0.94]
┌─────────────────────────────────────────────────────────────────
│ // JWT verification middleware
│ export async function verifyAuth(req: Request) {
│ const token = req.headers.get('Authorization')?.replace('Bearer ', '');
│ if (!token) throw new AuthError('Missing token');
│
│ const payload = await verifyJWT(token, process.env.JWT_SECRET);
│ return { userId: payload.sub, role: payload.role };
│ }
└─────────────────────────────────────────────────────────────────
2. src/auth/providers/supabase.ts (lines 12-67) [score: 0.91]
┌─────────────────────────────────────────────────────────────────
│ // Supabase auth provider implementation
│ export const supabaseAuth = {
│ signIn: async (email: string, password: string) => {
│ const { data, error } = await supabase.auth.signInWithPassword({
│ email, password
│ });
│ ...
└─────────────────────────────────────────────────────────────────
3. docs/ARCHITECTURE.md (lines 156-180) [score: 0.87]
┌─────────────────────────────────────────────────────────────────
│ ## Authentication Design
│
│ We use Supabase Auth with JWT tokens. The flow:
│ 1. User signs in via Supabase
│ 2. Frontend stores access token
│ 3. API routes verify via middleware
│ ...
└─────────────────────────────────────────────────────────────────
[5 more results...]
vectorize configShow current vectorization settings.
vectorize config
<project>/
├── .vectorindex/ # Gitignored
│ ├── codebase.lance/ # LanceDB table for code embeddings
│ ├── database.lance/ # LanceDB table for schema/config embeddings
│ ├── bm25/ # BM25 keyword index
│ ├── metadata.json # Index state, timestamps, chunk count
│ └── contexts/ # Cached contextual descriptions
├── docs/
│ └── project.json # Contains vectorization config
| Provider | Model | Best For | Env Var |
|----------|-------|----------|---------|
| Voyage AI | voyage-code-3 | Code retrieval ⭐ | VOYAGE_API_KEY |
| Voyage AI | voyage-3.5 | General purpose | VOYAGE_API_KEY |
| Voyage AI | voyage-3.5-lite | Low latency/cost | VOYAGE_API_KEY |
| OpenAI | text-embedding-3-small | General purpose | OPENAI_API_KEY |
| OpenAI | text-embedding-3-large | Higher quality | OPENAI_API_KEY |
| Ollama | Local models | Free, offline | None (local) |
Recommendation: Use voyage-code-3 for code search. It's specifically optimized for code retrieval and is recommended by Anthropic.
When using Voyage AI embeddings, the system automatically batches chunks to stay within API token limits:
| Limit | Value | Purpose | |-------|-------|---------| | Token limit | 50,000 per batch | Stay under Voyage 120k API limit (conservative) | | Chunk limit | 100 per batch | API batch size limit |
How it works:
Benefits:
Output modes:
| Mode | Command | Output |
|------|---------|--------|
| Default | vectorize refresh | Progress bar + summary ("100 chunks in 3 batches") |
| Verbose | vectorize refresh --verbose | Per-batch breakdown |
| Quiet | vectorize refresh --quiet | Errors only (for CI/scripts) |
Default output:
Building index...
[████████████████████] 100%
Total: 8,453 chunks in 85 batches, 85 API calls
Verbose output:
Building index...
Batch 1: 98 chunks, ~49,500 tokens
Batch 2: 97 chunks, ~48,200 tokens
...
Batch 85: 12 chunks, ~5,100 tokens (final)
[████████████████████] 100%
Total: 8,453 chunks in 85 batches, 85 API calls
{
"vectorization": {
"enabled": true,
"storage": "local",
"embeddingModel": "voyage-code-3",
"contextualRetrieval": "auto",
"codebase": {
"include": ["src/**", "lib/**", "docs/**"],
"exclude": ["node_modules/**", "dist/**", "*.test.ts"],
"chunkStrategy": "ast"
},
"database": {
"enabled": true,
"connection": "env:DATABASE_URL",
"type": "postgres",
"schema": {
"include": ["public.*"],
"exclude": ["public.migrations"]
},
"configTables": [
{
"table": "public.pricing_tiers",
"description": "Subscription pricing and feature limits",
"sampleRows": 10
}
]
},
"search": {
"hybridWeight": 0.7,
"topK": 20,
"reranking": {
"enabled": false,
"model": "cross-encoder"
}
},
"refresh": {
"onGitChange": true,
"onSessionStart": true,
"maxAge": "24h"
},
"credentials": {
"voyage": "env:VOYAGE_API_KEY",
"openai": "env:OPENAI_API_KEY",
"anthropic": "env:ANTHROPIC_API_KEY"
}
}
}
When vectorization is enabled, agents have access to a semantic_search tool:
// Tool signature
semantic_search({
query: string, // Natural language query
filters?: {
filePatterns?: string[], // e.g., ["src/auth/**", "*.ts"]
languages?: string[], // e.g., ["typescript", "python"]
contentType?: "code" | "schema" | "config" | "docs"
},
topK?: number // Override default (20)
})
// Returns
{
results: [
{
content: string, // Chunk content
filePath: string, // e.g., "src/auth/middleware.ts"
lineRange: [45, 89], // Start and end lines
language: string, // e.g., "typescript"
score: number, // Relevance score (0-1)
type: "code" | "schema" | "config" | "docs"
}
],
indexAge: string, // e.g., "2 hours ago"
queryTime: number // Milliseconds
}
Agents automatically use semantic search when:
vectorization.enabled: true in project.json.vectorindex/maxAge)Example agent prompt usage:
// @builder looking for authentication patterns
Before implementing the auth feature, let me search for existing patterns:
semantic_search("How is authentication implemented?")
→ Found middleware in src/auth/middleware.ts
→ Found provider in src/auth/providers/supabase.ts
→ Found architecture docs explaining the flow
Now I can implement consistent with existing patterns.
Uses Tree-sitter for language-aware chunking:
Chunks respect semantic boundaries. Large functions (>500 tokens) are split with overlap.
For unsupported languages or config files:
When enabled, each chunk is enriched with a brief contextual description before embedding.
How it works:
Example:
Original chunk:
export async function verifyAuth(req: Request) {
const token = req.headers.get('Authorization')?.replace('Bearer ', '');
if (!token) throw new AuthError('Missing token');
return verifyJWT(token, process.env.JWT_SECRET);
}
With context:
[This function is the main authentication middleware in the auth module. It extracts
the JWT token from the Authorization header and verifies it using the JWT_SECRET
environment variable. It's used by all protected API routes.]
export async function verifyAuth(req: Request) {
const token = req.headers.get('Authorization')?.replace('Bearer ', '');
if (!token) throw new AuthError('Missing token');
return verifyJWT(token, process.env.JWT_SECRET);
}
Benefits:
Cost:
Extracts and indexes:
Example indexed content:
Table: public.users
Description: Application users and their profiles
Columns:
- id: uuid (primary key, default: gen_random_uuid())
- email: text (unique, not null)
- password_hash: text (not null)
- full_name: text
- role: text (default: 'user', check: role in ('user', 'admin', 'moderator'))
- created_at: timestamptz (default: now())
- updated_at: timestamptz
Foreign keys:
- organization_id → organizations(id)
Indexes:
- users_email_idx on (email)
- users_org_idx on (organization_id)
For designated config tables, extracts sample rows:
Table: public.pricing_tiers
Description: Subscription pricing and feature limits
Sample rows:
| name | price_monthly | price_yearly | max_users | features |
|------------|---------------|--------------|-----------|---------------------|
| Free | 0 | 0 | 1 | ["basic"] |
| Pro | 29 | 290 | 5 | ["basic", "api"] |
| Enterprise | 99 | 990 | unlimited | ["basic", "api", …] |
Installed automatically by vectorize init:
#!/bin/sh
# .git/hooks/post-commit
# Get changed files
CHANGED_FILES=$(git diff-tree --no-commit-id --name-only -r HEAD)
# Run incremental vectorize
if [ -d ".vectorindex" ]; then
npx vectorize refresh --incremental --files "$CHANGED_FILES"
fi
When an agent session starts:
.vectorindex/metadata.json existslastUpdated timestamp with current timemaxAge (default 24h), prompt for refreshgit log HEAD to detect missed commits| Codebase Size | Files | Chunks | Embedding Cost | Contextual Cost | Total | |---------------|-------|--------|----------------|-----------------|-------| | Small | 500 | 3k | ~$0.01 | ~$1.50 | ~$1.51 | | Medium | 2k | 12k | ~$0.02 | ~$6.00 | ~$6.02 | | Large | 10k | 60k | ~$0.10 | ~$30.00 | ~$30.10 |
Get an API key from Voyage AI and set it:
export VOYAGE_API_KEY=pa-...
Or add to your shell profile (~/.zshrc, ~/.bashrc).
Set the environment variable:
export OPENAI_API_KEY=sk-...
Or add to your shell profile (~/.zshrc, ~/.bashrc).
Run refresh:
vectorize refresh
codebase.include patternsvectorize search to test different queriesvectorize status for index healthembeddingModel: "voyage-3.5-lite" for lower costcontextualRetrieval: "never"embeddingModel: "ollama" (free)docs/** to include patternsdist/**, build/**, .next/***.test.ts to reduce noiseonGitChange hookvectorize init --dry-run to see estimatesdata-ai
Generate verification contracts before delegating tasks to sub-agents, defining how success will be measured. Triggers on: verification contract, delegation contract, task verification, contract-first delegation.
testing
Verify that Vercel environment variables point to the correct Supabase project for each environment to prevent staging/production cross-wiring. Triggers on: vercel supabase check, environment alignment, env var check, supabase environment.
testing
Patterns for XCUITest UI tests for native Apple apps (macOS/iOS). Use when writing or reviewing XCUITest tests for Swift apps. Triggers on: XCUITest, xcuitest, native app testing, Apple UI tests, SwiftUI tests, AppKit tests, UIKit tests.
testing
Quality-beyond-correctness E2E testing patterns. Catches visual glitches, performance issues, layout shifts, and intermediate bad states. Triggers on: flicker test, visual stability, performance budget, negative assertion, CLS test, drag drop test, animation test.