skills/knowledge/database-design/SKILL.md
Database design patterns and data modeling for relational and NoSQL databases. Use when the user asks to design a database schema, normalize or denormalize tables, create indexing strategies, plan schema migrations, model temporal data, implement audit trails, set up table partitioning, or optimize data access patterns. Covers entity relationships, naming conventions, constraint design, migration safety, and performance-oriented schema decisions.
npx skillsauth add krzysztofsurdy/code-virtuoso database-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Good database design determines the long-term maintainability, performance, and correctness of any data-driven application. Schema decisions made early are expensive to reverse later. Every table, column, index, and constraint should exist for a reason backed by access patterns and business rules.
Design tables around how the application reads and writes data, not around how entities look in a domain model. Two questions drive every schema decision:
| Relationship | Implementation | When to Use | |---|---|---| | One-to-one | Foreign key with UNIQUE constraint on the child table | Splitting rarely-accessed columns into a separate table, or enforcing exactly-one semantics | | One-to-many | Foreign key on the child table referencing the parent | Orders to order items, users to addresses | | Many-to-many | Join table with composite primary key | Tags to articles, students to courses | | Many-to-many with attributes | Join table with its own columns beyond the two foreign keys | Enrollment with grade, membership with role | | Self-referential | Foreign key referencing the same table | Org charts, category trees, threaded comments |
Consistent naming prevents confusion across teams and tools:
order_items, user_addresses)created_at, total_amount)<referenced_table_singular>_id (user_id, order_id)idx_<table>_<columns> (idx_orders_user_id_created_at)chk_<table>_<rule>, uq_<table>_<columns>, fk_<table>_<referenced>| Form | Rule | Violation Example |
|---|---|---|
| 1NF | Every column holds atomic values; no repeating groups | Storing comma-separated tags in a single column |
| 2NF | Every non-key column depends on the entire primary key | In a composite-key table, a column depending on only part of the key |
| 3NF | No non-key column depends on another non-key column | Storing both city and zip_code when zip determines city |
| BCNF | Every determinant is a candidate key | A scheduling table where room determines building but room is not a key |
Normalization prevents anomalies but adds JOINs. Denormalize selectively when:
order_total stored on the order row)Rules for safe denormalization:
| Type | Strengths | Fits When | |---|---|---| | Relational (PostgreSQL, MySQL) | ACID transactions, complex queries, mature tooling, JOINs | Structured data with relationships, transactional workloads, most CRUD applications | | Document (MongoDB, DynamoDB) | Flexible schema, nested data, horizontal scaling | Aggregates accessed as a unit, rapidly evolving schemas, per-tenant isolation | | Key-value (Redis, Memcached) | Sub-millisecond reads, simple data model | Session storage, caching, counters, rate limiting | | Column-family (Cassandra, ScyllaDB) | High write throughput, wide rows, linear scaling | Time-series, IoT telemetry, append-heavy workloads | | Graph (Neo4j, Neptune) | Traversal queries, relationship-centric data | Social networks, recommendation engines, fraud detection | | Time-series (TimescaleDB, InfluxDB) | Optimized for time-stamped data, automatic partitioning | Metrics, monitoring, financial tick data |
Polyglot persistence - using different databases for different parts of the same system - is valid when access patterns genuinely differ. It is not valid as a way to avoid learning one database well.
Indexes accelerate reads at the cost of slower writes and additional storage. Every index must justify its existence through query patterns.
| Type | Structure | Best For | |---|---|---| | B-tree | Balanced tree, sorted data | Equality and range queries, ORDER BY, most general-purpose indexing | | Hash | Hash table | Exact equality lookups only; no range support | | GiST | Generalized search tree | Spatial data, geometric queries, range types, nearest-neighbor | | GIN | Generalized inverted index | Full-text search, JSONB containment, array membership | | BRIN | Block range index | Large tables with naturally ordered data (timestamps, sequential IDs) |
The order of columns in a composite index matters. The leftmost prefix rule means a composite index on (a, b, c) supports queries filtering on (a), (a, b), or (a, b, c), but not (b, c) alone.
Column ordering guidelines:
=>, <, BETWEENINCLUDE (PostgreSQL) or just add columns to the index key.WHERE status = 'pending').See Indexing Strategies Reference for detailed index types, EXPLAIN analysis, and anti-patterns.
Schema changes are inevitable. The question is whether they break running applications.
VARCHAR(50) to VARCHAR(100))For breaking changes in production with zero downtime:
See Migration Patterns Reference for zero-downtime strategies, rollback techniques, and multi-tool examples.
| Strategy | How It Works | Use Case | |---|---|---| | Range | Rows split by value ranges (e.g., by month) | Time-series data, log tables, archival | | List | Rows split by discrete values (e.g., by region) | Multi-tenant data, geographic segmentation | | Hash | Rows distributed by hash of a column | Even distribution when no natural range exists |
Sharding distributes data across separate database instances. Use it only after single-instance optimizations (indexing, caching, read replicas) are exhausted.
Shard key selection criteria:
| Mistake | Consequence | Fix |
|---|---|---|
| No foreign key constraints | Orphaned rows, inconsistent data | Always define foreign keys unless there is a documented reason not to |
| Over-indexing | Slow writes, wasted storage | Index only columns used in WHERE, JOIN, ORDER BY of actual queries |
| Storing computed values without a refresh strategy | Stale data, silent bugs | Define update triggers, events, or batch jobs alongside any denormalization |
| Using ENUM types for values that change | Schema migration for every new value | Use a lookup table with a foreign key instead |
| Storing money as floating-point | Rounding errors | Use DECIMAL/NUMERIC or store as integer cents |
| Missing created_at / updated_at timestamps | No auditability, difficult debugging | Add timestamp columns to every table by default |
| Generic type + type_id polymorphism everywhere | No referential integrity, complex queries | Evaluate STI, CTI, or separate tables first |
| Reference | Contents | |---|---| | Modeling Patterns | Polymorphic associations (STI/CTI/TPT), soft deletes, audit trails, temporal data, self-referential trees, JSON columns | | Indexing Strategies | B-tree/hash/GiST/GIN details, composite index design, covering and partial indexes, EXPLAIN analysis, anti-patterns | | Migration Patterns | Version-based vs state-based migrations, expand-contract, data migrations, rollback strategies, multi-tool examples |
| Situation | Recommended Skill |
|---|---|
| Optimizing query performance and caching | Install knowledge-virtuoso from krzysztofsurdy/code-virtuoso for performance optimization patterns |
| Designing domain models and aggregates | Install knowledge-virtuoso from krzysztofsurdy/code-virtuoso for clean architecture and DDD guidance |
| Building APIs that expose database-backed resources | Install knowledge-virtuoso from krzysztofsurdy/code-virtuoso for API design principles |
| Testing database interactions | Install knowledge-virtuoso from krzysztofsurdy/code-virtuoso for testing strategies |
development
Spawn and coordinate a pre-composed agent team from a team definition file. Reads team files from teams/, resolves agents and skills, picks the best spawning mode (peer or sequential), and runs the workflow. Use when the user asks to run a team, dispatch a development team, start a feature delivery, or coordinate multiple agents for a multi-phase task.
development
Pre-composed agent team library. Use when the user asks which teams are available, what a team does, when to pick one team over another, or to browse multi-agent compositions. Catalogs ready-to-run teams (development team, review squad, war room) with their purpose, agent roster, workflow type, and when to use each. The actual dispatching is handled by the dispatching-agent-teams skill.
tools
Ecosystem discovery advisor. Use when the user asks 'what skill should I use', 'what agent should I delegate to', 'which team fits this task', or when onboarding to available skills, agents, and teams. Scans ALL installed skills at runtime -- not limited to any single plugin or vendor. Triggers: 'which skill', 'which agent', 'what do I use for', 'orient me', 'what tools do I have'.
tools
Interactive tool to scaffold a complete Claude Code plugin -- plugin.json manifest, skills, agents, hooks, MCP servers, LSP servers, and an optional marketplace.json catalog entry. Use when the user asks to create a plugin, build a Claude Code plugin, scaffold a plugin marketplace, convert an existing .claude/ configuration into a plugin, or package skills and agents for distribution. Runs a guided questionnaire, writes all required files to disk, and prints test instructions.