.claude/skills/database-architect/SKILL.md
Database design and optimization specialist. Schema design, query optimization, indexing strategies, data modeling, and migration planning for relational and NoSQL databases.
npx skillsauth add oimiragieo/agent-studio database-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Gather requirements:
For Relational Databases:
For NoSQL Databases:
Index strategy based on query patterns:
-- Example: Users table with common queries
CREATE INDEX idx_users_email ON users(email); -- Exact match
CREATE INDEX idx_users_name ON users(last_name, first_name); -- Range/sort
CREATE INDEX idx_users_created ON users(created_at DESC); -- Ordering
Index Guidelines:
Create versioned migrations:
migrations/
001_create_users.sql
002_add_email_index.sql
003_create_orders.sql
Migration Best Practices:
Analyze and improve slow queries:
Leverage PostgreSQL 17 capabilities where applicable:
Performance improvements:
VACUUM memory management — up to 20x lower memory footprint; vacuum now runs faster on busy systemsBRIN indexes support parallel buildsIN clause queriesSQL/JSON enhancements (PG 17):
JSON_TABLE() — converts JSON data into relational table representationJSON(), JSON_SCALAR(), JSON_ARRAY(), JSON_OBJECT())jsonpath for expressive path-based queries over JSONB columnsIncremental backups:
pg_basebackup supports incremental backup; combine with pg_upgrade for zero-data-loss major version upgradesLogical replication improvements:
pg_createsubscriber creates logical replicas from physical standbyspg_upgrade now preserves logical replication slots across major version upgradesSecurity:
MAINTAIN privilege — grants targeted maintenance rights without full superuser accesssslnegotiation=direct client option for direct TLS handshake (avoids round-trip)COPY improvements:
COPY ... ON_ERROR ignore — continues import on row-level errors instead of abortingStore and query vector embeddings alongside relational data to avoid a separate vector database:
-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Table with embedding column (1536 dims for OpenAI text-embedding-3-small)
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding vector(1536),
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- IVFFlat index for approximate nearest neighbor (ANN) search
-- lists = sqrt(row_count) is a good starting value
CREATE INDEX idx_documents_embedding ON documents
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- HNSW index (faster queries, more memory; preferred for < 1M vectors)
CREATE INDEX idx_documents_embedding_hnsw ON documents
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
-- Similarity search (cosine distance)
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
When to use pgvector vs. dedicated vector DB:
Use declarative partitioning for tables expected to exceed available RAM:
-- Range partitioning by date (common for time-series / logs)
CREATE TABLE events (
id BIGSERIAL,
created_at TIMESTAMPTZ NOT NULL,
event_type TEXT NOT NULL,
payload JSONB
) PARTITION BY RANGE (created_at);
-- Monthly partitions
CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE events_2025_02 PARTITION OF events
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- Hash partitioning for even distribution (e.g., multi-tenant)
CREATE TABLE orders (
id UUID NOT NULL,
tenant_id UUID NOT NULL,
total DECIMAL(12,2)
) PARTITION BY HASH (tenant_id);
CREATE TABLE orders_p0 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 0);
CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 1);
CREATE TABLE orders_p2 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 2);
CREATE TABLE orders_p3 PARTITION OF orders FOR VALUES WITH (modulus 4, remainder 3);
Partition pruning: PostgreSQL automatically skips irrelevant partitions when the partition key appears in WHERE. Always include the partition key in queries.
Index on partitioned tables: Indexes created on the parent table are automatically created on all child partitions.
-- Generated columns promote hot JSONB fields to indexed native columns
CREATE TABLE customers (
id BIGSERIAL PRIMARY KEY,
data JSONB NOT NULL,
-- Promote frequently filtered fields to B-tree indexed generated columns
country TEXT GENERATED ALWAYS AS (data->>'country') STORED,
signup_date DATE GENERATED ALWAYS AS ((data->>'signup_date')::DATE) STORED
);
CREATE INDEX idx_customers_country ON customers (country);
CREATE INDEX idx_customers_signup ON customers (signup_date);
-- GIN index for containment / key-existence queries
CREATE INDEX idx_customers_data_gin ON customers USING GIN (data);
-- Partial GIN index for large tables (index only active records)
CREATE INDEX idx_customers_data_active ON customers
USING GIN (data) WHERE (data->>'status') = 'active';
-- jsonpath query example (PG 17)
SELECT * FROM customers
WHERE data @? '$.tags[*] ? (@ == "premium")';
Use a connection pooler in front of PostgreSQL for all production deployments:
PgBouncer (lightweight, battle-tested):
# pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction ; transaction pooling for stateless apps
max_client_conn = 1000
default_pool_size = 25
server_pool_size = 5
Supavisor (cloud-native, multi-tenant, Elixir-based):
Pooling modes:
| Mode | Use Case | Notes |
| ----------- | ----------------------------------------- | ---------------------------------------------------- |
| Session | Long-running connections, LISTEN/NOTIFY | 1 client = 1 server connection |
| Transaction | Stateless APIs (recommended default) | Most efficient; breaks SET / prepared statements |
| Statement | Rarely needed | Each statement can use a different server connection |
</execution_process>
<best_practices>
</best_practices> </instructions>
<examples> <usage_example> **Schema Design Request**:Design a schema for an e-commerce orders system
Example Response:
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Orders table
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id),
status VARCHAR(20) DEFAULT 'pending',
total_amount DECIMAL(10,2),
created_at TIMESTAMP DEFAULT NOW()
);
-- Order items (1:N relationship)
CREATE TABLE order_items (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
order_id UUID REFERENCES orders(id) ON DELETE CASCADE,
product_id UUID NOT NULL,
quantity INTEGER NOT NULL,
unit_price DECIMAL(10,2)
);
-- Indexes for common queries
CREATE INDEX idx_orders_user ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_order_items_order ON order_items(order_id);
</usage_example> </examples>
This skill has a corresponding workflow for complex multi-agent scenarios:
.claude/workflows/database-architect-skill-workflow.md| Anti-Pattern | Why It Fails | Correct Approach | | -------------------------------------------- | ---------------------------------------------------------- | -------------------------------------------------------------------------- | | Manual DDL directly on production | No rollback path; breaks migration history | Always use versioned migrations with DOWN scripts | | Premature denormalization | Adds complexity before profiling; often no measurable gain | Normalize first, denormalize only after EXPLAIN ANALYZE reveals bottleneck | | Indexing every column | Slows writes; wastes storage; misleads query planner | Index only columns that appear in WHERE, JOIN, and ORDER BY clauses | | Adding NOT NULL column without default | Locks entire table during migration on large datasets | Add nullable column, backfill in batches, then add NOT NULL constraint | | Direct connections from serverless functions | Connection limit exhausted under load spike | Use PgBouncer or Supavisor in transaction-pooling mode |
Before starting:
cat .claude/context/memory/learnings.md
After completing:
.claude/context/memory/learnings.md.claude/context/memory/issues.md.claude/context/memory/decisions.mdASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.
tools
Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.
tools
Comprehensive toolkit for creating, analyzing, and visualizing complex networks and graphs in Python. Use when working with network/graph data structures, analyzing relationships between entities, computing graph algorithms (shortest paths, centrality, clustering), detecting communities, generating synthetic networks, or visualizing network topologies. Applicable to social networks, biological networks, transportation systems, citation networks, and any domain involving pairwise relationships.
data-ai
Molecular featurization for ML (100+ featurizers). ECFP, MACCS, descriptors, pretrained models (ChemBERTa), convert SMILES to features, for QSAR and molecular ML.
development
Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.