Database Schema Design

Production-proven patterns for relational database schema design in PostgreSQL/MySQL.

Foundational Principles

Name things for humans — schema is read far more than written; optimize for readability
Constraints are free documentation — every NOT NULL, CHECK, and UNIQUE constraint documents a business rule in the database itself
Normalize first, denormalize with evidence — start normalized; only denormalize when you have measured performance problems
Schema changes are the hardest deployments — design carefully upfront; migrations on large tables are expensive

Table Naming and Column Conventions

-- Tables: plural, snake_case
CREATE TABLE orders         -- ✓
CREATE TABLE order_items    -- ✓ (not orderItems, not order_item)
CREATE TABLE OrderTable     -- ✗

-- Primary key: always 'id', always the first column
-- Use UUID for:    distributed systems, public-facing IDs (prevents enumeration)
-- Use BIGSERIAL for: purely internal tables with very high insert rates

CREATE TABLE users (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email       TEXT NOT NULL,  -- 'email' not 'user_email' or 'emailAddress'
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Foreign keys: {singular_referenced_table}_id
CREATE TABLE orders (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id UUID NOT NULL REFERENCES users(id),  -- not user_id (ambiguous), not users_id
  ...
);

-- Boolean columns: is_, has_, can_ prefix
is_active         BOOLEAN NOT NULL DEFAULT true
has_subscription  BOOLEAN NOT NULL DEFAULT false
can_edit          BOOLEAN NOT NULL DEFAULT false

-- Timestamp columns: always include these
created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
deleted_at  TIMESTAMPTZ          -- Nullable = present; not null = soft-deleted

Column Type Selection

-- IDs and references
UUID                  -- Public-facing IDs (opaque, non-enumerable)
BIGINT GENERATED ALWAYS AS IDENTITY  -- Internal surrogate keys (compact, sequential)
TEXT                  -- User-provided string IDs (Stripe charges, etc.)

-- Text
TEXT                  -- Variable-length text (PostgreSQL: TEXT = VARCHAR(∞), equally fast)
VARCHAR(n)            -- When there IS a meaningful max length (e.g., zip codes, country codes)
CHAR(n)               -- Almost never — fixed-width strings, pads with spaces

-- Numbers
INTEGER               -- Counts, small integers (−2B to 2B)
BIGINT                -- Large counts, snowflake IDs
NUMERIC(10, 2)        -- Monetary values (exact decimal — NEVER float for money)
-- Or store in cents (integer) and divide when displaying: 4999 → $49.99

-- Floats (only for scientific/measurement values where approximation is acceptable)
DOUBLE PRECISION      -- 64-bit float
REAL                  -- 32-bit float

-- Date and time
TIMESTAMPTZ           -- Always: UTC with timezone info (PostgreSQL)
DATE                  -- Date without time (birthdate, invoice date)
TIME                  -- Time of day without date (business hours)
INTERVAL              -- Duration (3 days, 2 hours)
-- NEVER: Unix epoch as INTEGER (unreadable, error-prone, no DB functions)

-- Booleans
BOOLEAN               -- true/false — use for binary state only
-- Don't use TINYINT(1) in PostgreSQL — that's a MySQL pattern

-- JSON
JSONB                 -- PostgreSQL: binary JSON, indexable, queryable (prefer over JSON)
JSON                  -- PostgreSQL: text JSON (exact storage, no indexing) — rarely preferred
-- When to use JSONB: config, metadata, infrequently queried flexible attributes
-- When NOT to use JSONB: data you need to query, filter, or join on — make it relational

-- Enums
-- Option 1: PostgreSQL ENUM type (schema change to add values)
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'cancelled');

-- Option 2: TEXT with CHECK constraint (easier to evolve, readable in dumps)
status TEXT NOT NULL CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled'))

-- Option 3: Lookup table (most flexible, nullable with FK)
CREATE TABLE order_statuses (id TEXT PRIMARY KEY, label TEXT NOT NULL);
INSERT INTO order_statuses VALUES ('pending', 'Awaiting payment'), ...;
status TEXT NOT NULL REFERENCES order_statuses(id)

Relationship Patterns

One-to-Many

CREATE TABLE customers (
  id    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email TEXT NOT NULL UNIQUE  -- UNIQUE enforces one account per email
);

CREATE TABLE orders (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id UUID NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
  -- ON DELETE: CASCADE | RESTRICT | NO ACTION | SET NULL | SET DEFAULT
  -- CASCADE: deleting customer deletes their orders (use carefully)
  -- RESTRICT: prevent deleting customer if they have orders (safer default)
  -- SET NULL: set customer_id to null (requires nullable column)
  
  total_cents INTEGER NOT NULL CHECK (total_cents >= 0),
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Always index the FK side
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Without this: DELETE FROM customers WHERE id = ? does a sequential scan of orders

Many-to-Many

CREATE TABLE products (
  id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL
);

CREATE TABLE tags (
  id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL UNIQUE
);

-- Junction table: named by combining the two entities
CREATE TABLE product_tags (
  product_id UUID NOT NULL REFERENCES products(id) ON DELETE CASCADE,
  tag_id     UUID NOT NULL REFERENCES tags(id)     ON DELETE CASCADE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  
  PRIMARY KEY (product_id, tag_id)  -- Composite PK prevents duplicates
);

-- Index both directions (PK covers product_id → tag_id direction)
CREATE INDEX idx_product_tags_tag_id ON product_tags(tag_id);
-- This makes queries "which products have tag X?" efficient

Self-Referential (Tree / Hierarchy)

-- Adjacency list (simple, but expensive for deep tree queries)
CREATE TABLE categories (
  id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  parent_id UUID REFERENCES categories(id) ON DELETE RESTRICT,  -- nullable = root
  name      TEXT NOT NULL
);
CREATE INDEX idx_categories_parent_id ON categories(parent_id);

-- Materialized path (fast reads, batch updates on move)
CREATE TABLE categories (
  id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  path TEXT NOT NULL,   -- '/electronics/computers/laptops'
  name TEXT NOT NULL
);
CREATE INDEX idx_categories_path ON categories(path);
-- All descendants: WHERE path LIKE '/electronics/%'

-- Closure table (most flexible for tree queries)
CREATE TABLE category_ancestors (
  ancestor_id   UUID NOT NULL REFERENCES categories(id),
  descendant_id UUID NOT NULL REFERENCES categories(id),
  depth         INTEGER NOT NULL,  -- 0 = self
  PRIMARY KEY (ancestor_id, descendant_id)
);

Index Design

-- B-tree index (default) — equality, range, ORDER BY, BETWEEN
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

-- Composite index — field order matters
-- Rule: equality conditions first, then range conditions
-- Query: WHERE customer_id = ? AND created_at > ?
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at DESC);
-- This index can answer: WHERE customer_id = ?  AND  WHERE customer_id = ? AND created_at > ?
-- It CANNOT efficiently answer: WHERE created_at > ? alone

-- Partial index — index only rows matching a condition
-- Query pattern: WHERE deleted_at IS NULL AND status = 'active'
CREATE INDEX idx_orders_active ON orders(customer_id, created_at)
  WHERE deleted_at IS NULL AND status IN ('pending', 'processing');
-- Smaller index, much faster for the specific use case

-- Covering index — includes columns to avoid table heap access
-- Query: SELECT id, status FROM orders WHERE customer_id = ?
CREATE INDEX idx_orders_customer_covering ON orders(customer_id) INCLUDE (status);
-- status is in the index, so PostgreSQL never touches the main table

-- GIN index — for array columns, JSONB, and full-text search
CREATE INDEX idx_users_tags ON users USING GIN(tags);
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);
CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', body));

-- Always add indexes CONCURRENTLY in production (avoids table lock)
CREATE INDEX CONCURRENTLY idx_name ON table(column);

When NOT to Add an Index

Low-cardinality column alone (e.g., status with only 3 values on a small table)
Column almost never queried in WHERE/JOIN/ORDER BY
Table with under ~1,000 rows (sequential scan is faster)
Write-heavy tables where each insert/update/delete must maintain all indexes

Constraints

CREATE TABLE orders (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id   UUID NOT NULL REFERENCES customers(id),
  status        TEXT NOT NULL DEFAULT 'pending'
    CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
  total_cents   INTEGER NOT NULL CHECK (total_cents >= 0),
  discount_cents INTEGER NOT NULL DEFAULT 0 CHECK (discount_cents >= 0),
  -- Business rule: discount can't exceed total
  CHECK (discount_cents <= total_cents),
  
  -- Exactly one of these is set (polymorphic reference — better as separate tables but this works)
  payment_card_id    UUID REFERENCES payment_cards(id),
  payment_account_id UUID REFERENCES payment_accounts(id),
  CHECK (
    (payment_card_id IS NOT NULL)::integer + 
    (payment_account_id IS NOT NULL)::integer = 1
  ),
  
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Multi-column uniqueness
CREATE UNIQUE INDEX uq_orders_idempotency 
  ON orders(customer_id, idempotency_key)
  WHERE idempotency_key IS NOT NULL;

Audit Logging Pattern

-- Generic audit log — works for any table
CREATE TABLE audit_logs (
  id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  table_name  TEXT NOT NULL,
  record_id   UUID NOT NULL,
  operation   TEXT NOT NULL CHECK (operation IN ('INSERT', 'UPDATE', 'DELETE')),
  old_data    JSONB,
  new_data    JSONB,
  changed_by  UUID REFERENCES users(id),
  ip_address  INET,
  user_agent  TEXT,
  occurred_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_audit_table_record ON audit_logs(table_name, record_id, occurred_at DESC);
CREATE INDEX idx_audit_user ON audit_logs(changed_by, occurred_at DESC);

-- PostgreSQL trigger to auto-populate audit log
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO audit_logs (table_name, record_id, operation, old_data, new_data)
  VALUES (
    TG_TABLE_NAME,
    COALESCE(NEW.id, OLD.id),
    TG_OP,
    CASE WHEN TG_OP = 'INSERT' THEN NULL ELSE row_to_json(OLD)::JSONB END,
    CASE WHEN TG_OP = 'DELETE' THEN NULL ELSE row_to_json(NEW)::JSONB END
  );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Apply to a table
CREATE TRIGGER orders_audit
  AFTER INSERT OR UPDATE OR DELETE ON orders
  FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();

Soft Delete Pattern

-- Add to any table that needs soft delete
deleted_at TIMESTAMPTZ;  -- NULL = active, timestamp = deleted

-- Partial index makes queries on active records efficient
CREATE INDEX idx_orders_active ON orders(customer_id, created_at)
  WHERE deleted_at IS NULL;

-- PostgreSQL Row Security Policy (enforce soft delete at DB level)
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY orders_hide_deleted ON orders
  USING (deleted_at IS NULL);

-- Application-level: always filter
SELECT * FROM orders WHERE customer_id = ? AND deleted_at IS NULL;

-- "Delete"
UPDATE orders SET deleted_at = now() WHERE id = ?;

Avoiding Common Anti-Patterns

EAV (Entity-Attribute-Value) Anti-pattern

-- ANTI-PATTERN: EAV — looks flexible, is a performance disaster
CREATE TABLE entity_attributes (
  entity_id  UUID,
  attr_name  TEXT,
  attr_value TEXT  -- All values as strings = no type safety
);

-- BETTER: Use JSONB for truly dynamic attributes
CREATE TABLE products (
  id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name       TEXT NOT NULL,
  attributes JSONB NOT NULL DEFAULT '{}'  -- {"color": "red", "size": "M", "weight_g": 150}
);
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);
-- Query: WHERE attributes->>'color' = 'red'
-- Query: WHERE (attributes->>'weight_g')::int > 100

Polymorphic Associations Anti-pattern

-- ANTI-PATTERN: polymorphic (can't have FK constraint)
CREATE TABLE comments (
  commentable_type TEXT,   -- 'Post', 'Video', 'Article'
  commentable_id   UUID,   -- No FK possible — DB can't enforce integrity
  ...
);

-- BETTER: separate tables (FK integrity preserved)
CREATE TABLE post_comments    (post_id    UUID REFERENCES posts(id), ...);
CREATE TABLE video_comments   (video_id   UUID REFERENCES videos(id), ...);
CREATE TABLE article_comments (article_id UUID REFERENCES articles(id), ...);

-- Or: null all but one FK (with check constraint)
CREATE TABLE comments (
  post_id    UUID REFERENCES posts(id),
  video_id   UUID REFERENCES videos(id),
  article_id UUID REFERENCES articles(id),
  CHECK (
    (post_id IS NOT NULL)::int +
    (video_id IS NOT NULL)::int +
    (article_id IS NOT NULL)::int = 1
  ),
  ...
);

Storing Delimited Values

-- ANTI-PATTERN: comma-separated values in a column
CREATE TABLE users (
  tags TEXT  -- 'developer,admin,beta-user' — cannot index, cannot FK, hard to query
);

-- BETTER: array column (PostgreSQL)
CREATE TABLE users (
  tags TEXT[] NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_users_tags ON users USING GIN(tags);
-- Query: WHERE 'developer' = ANY(tags)

-- OR: proper junction table (best for entity references)
CREATE TABLE user_tags ( user_id UUID, tag_id UUID, PRIMARY KEY (user_id, tag_id) );

Deeper Reference

For normalization decision guides and advanced schema patterns, see:

references/normalization-guide.md — normal forms with examples, audit trail patterns, soft-delete strategies, and multi-tenancy schema designs
references/advanced-patterns.md — time-series schema design, graph data in PostgreSQL, polymorphic associations, and event sourcing table layouts

Database Schema Design

Production-proven patterns for relational database schema design in PostgreSQL/MySQL.

Foundational Principles

Name things for humans — schema is read far more than written; optimize for readability
Constraints are free documentation — every NOT NULL, CHECK, and UNIQUE constraint documents a business rule in the database itself
Normalize first, denormalize with evidence — start normalized; only denormalize when you have measured performance problems
Schema changes are the hardest deployments — design carefully upfront; migrations on large tables are expensive

Table Naming and Column Conventions

-- Tables: plural, snake_case
CREATE TABLE orders         -- ✓
CREATE TABLE order_items    -- ✓ (not orderItems, not order_item)
CREATE TABLE OrderTable     -- ✗

-- Primary key: always 'id', always the first column
-- Use UUID for:    distributed systems, public-facing IDs (prevents enumeration)
-- Use BIGSERIAL for: purely internal tables with very high insert rates

CREATE TABLE users (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email       TEXT NOT NULL,  -- 'email' not 'user_email' or 'emailAddress'
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Foreign keys: {singular_referenced_table}_id
CREATE TABLE orders (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id UUID NOT NULL REFERENCES users(id),  -- not user_id (ambiguous), not users_id
  ...
);

-- Boolean columns: is_, has_, can_ prefix
is_active         BOOLEAN NOT NULL DEFAULT true
has_subscription  BOOLEAN NOT NULL DEFAULT false
can_edit          BOOLEAN NOT NULL DEFAULT false

-- Timestamp columns: always include these
created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
deleted_at  TIMESTAMPTZ          -- Nullable = present; not null = soft-deleted

Column Type Selection

-- IDs and references
UUID                  -- Public-facing IDs (opaque, non-enumerable)
BIGINT GENERATED ALWAYS AS IDENTITY  -- Internal surrogate keys (compact, sequential)
TEXT                  -- User-provided string IDs (Stripe charges, etc.)

-- Text
TEXT                  -- Variable-length text (PostgreSQL: TEXT = VARCHAR(∞), equally fast)
VARCHAR(n)            -- When there IS a meaningful max length (e.g., zip codes, country codes)
CHAR(n)               -- Almost never — fixed-width strings, pads with spaces

-- Numbers
INTEGER               -- Counts, small integers (−2B to 2B)
BIGINT                -- Large counts, snowflake IDs
NUMERIC(10, 2)        -- Monetary values (exact decimal — NEVER float for money)
-- Or store in cents (integer) and divide when displaying: 4999 → $49.99

-- Floats (only for scientific/measurement values where approximation is acceptable)
DOUBLE PRECISION      -- 64-bit float
REAL                  -- 32-bit float

-- Date and time
TIMESTAMPTZ           -- Always: UTC with timezone info (PostgreSQL)
DATE                  -- Date without time (birthdate, invoice date)
TIME                  -- Time of day without date (business hours)
INTERVAL              -- Duration (3 days, 2 hours)
-- NEVER: Unix epoch as INTEGER (unreadable, error-prone, no DB functions)

-- Booleans
BOOLEAN               -- true/false — use for binary state only
-- Don't use TINYINT(1) in PostgreSQL — that's a MySQL pattern

-- JSON
JSONB                 -- PostgreSQL: binary JSON, indexable, queryable (prefer over JSON)
JSON                  -- PostgreSQL: text JSON (exact storage, no indexing) — rarely preferred
-- When to use JSONB: config, metadata, infrequently queried flexible attributes
-- When NOT to use JSONB: data you need to query, filter, or join on — make it relational

-- Enums
-- Option 1: PostgreSQL ENUM type (schema change to add values)
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'cancelled');

-- Option 2: TEXT with CHECK constraint (easier to evolve, readable in dumps)
status TEXT NOT NULL CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled'))

-- Option 3: Lookup table (most flexible, nullable with FK)
CREATE TABLE order_statuses (id TEXT PRIMARY KEY, label TEXT NOT NULL);
INSERT INTO order_statuses VALUES ('pending', 'Awaiting payment'), ...;
status TEXT NOT NULL REFERENCES order_statuses(id)

Relationship Patterns

One-to-Many

CREATE TABLE customers (
  id    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  email TEXT NOT NULL UNIQUE  -- UNIQUE enforces one account per email
);

CREATE TABLE orders (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id UUID NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
  -- ON DELETE: CASCADE | RESTRICT | NO ACTION | SET NULL | SET DEFAULT
  -- CASCADE: deleting customer deletes their orders (use carefully)
  -- RESTRICT: prevent deleting customer if they have orders (safer default)
  -- SET NULL: set customer_id to null (requires nullable column)
  
  total_cents INTEGER NOT NULL CHECK (total_cents >= 0),
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Always index the FK side
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Without this: DELETE FROM customers WHERE id = ? does a sequential scan of orders

Many-to-Many

CREATE TABLE products (
  id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL
);

CREATE TABLE tags (
  id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL UNIQUE
);

-- Junction table: named by combining the two entities
CREATE TABLE product_tags (
  product_id UUID NOT NULL REFERENCES products(id) ON DELETE CASCADE,
  tag_id     UUID NOT NULL REFERENCES tags(id)     ON DELETE CASCADE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  
  PRIMARY KEY (product_id, tag_id)  -- Composite PK prevents duplicates
);

-- Index both directions (PK covers product_id → tag_id direction)
CREATE INDEX idx_product_tags_tag_id ON product_tags(tag_id);
-- This makes queries "which products have tag X?" efficient

Self-Referential (Tree / Hierarchy)

-- Adjacency list (simple, but expensive for deep tree queries)
CREATE TABLE categories (
  id        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  parent_id UUID REFERENCES categories(id) ON DELETE RESTRICT,  -- nullable = root
  name      TEXT NOT NULL
);
CREATE INDEX idx_categories_parent_id ON categories(parent_id);

-- Materialized path (fast reads, batch updates on move)
CREATE TABLE categories (
  id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  path TEXT NOT NULL,   -- '/electronics/computers/laptops'
  name TEXT NOT NULL
);
CREATE INDEX idx_categories_path ON categories(path);
-- All descendants: WHERE path LIKE '/electronics/%'

-- Closure table (most flexible for tree queries)
CREATE TABLE category_ancestors (
  ancestor_id   UUID NOT NULL REFERENCES categories(id),
  descendant_id UUID NOT NULL REFERENCES categories(id),
  depth         INTEGER NOT NULL,  -- 0 = self
  PRIMARY KEY (ancestor_id, descendant_id)
);

Index Design

-- B-tree index (default) — equality, range, ORDER BY, BETWEEN
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

-- Composite index — field order matters
-- Rule: equality conditions first, then range conditions
-- Query: WHERE customer_id = ? AND created_at > ?
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at DESC);
-- This index can answer: WHERE customer_id = ?  AND  WHERE customer_id = ? AND created_at > ?
-- It CANNOT efficiently answer: WHERE created_at > ? alone

-- Partial index — index only rows matching a condition
-- Query pattern: WHERE deleted_at IS NULL AND status = 'active'
CREATE INDEX idx_orders_active ON orders(customer_id, created_at)
  WHERE deleted_at IS NULL AND status IN ('pending', 'processing');
-- Smaller index, much faster for the specific use case

-- Covering index — includes columns to avoid table heap access
-- Query: SELECT id, status FROM orders WHERE customer_id = ?
CREATE INDEX idx_orders_customer_covering ON orders(customer_id) INCLUDE (status);
-- status is in the index, so PostgreSQL never touches the main table

-- GIN index — for array columns, JSONB, and full-text search
CREATE INDEX idx_users_tags ON users USING GIN(tags);
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);
CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', body));

-- Always add indexes CONCURRENTLY in production (avoids table lock)
CREATE INDEX CONCURRENTLY idx_name ON table(column);

When NOT to Add an Index

Low-cardinality column alone (e.g., status with only 3 values on a small table)
Column almost never queried in WHERE/JOIN/ORDER BY
Table with under ~1,000 rows (sequential scan is faster)
Write-heavy tables where each insert/update/delete must maintain all indexes

Constraints

CREATE TABLE orders (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  customer_id   UUID NOT NULL REFERENCES customers(id),
  status        TEXT NOT NULL DEFAULT 'pending'
    CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
  total_cents   INTEGER NOT NULL CHECK (total_cents >= 0),
  discount_cents INTEGER NOT NULL DEFAULT 0 CHECK (discount_cents >= 0),
  -- Business rule: discount can't exceed total
  CHECK (discount_cents <= total_cents),
  
  -- Exactly one of these is set (polymorphic reference — better as separate tables but this works)
  payment_card_id    UUID REFERENCES payment_cards(id),
  payment_account_id UUID REFERENCES payment_accounts(id),
  CHECK (
    (payment_card_id IS NOT NULL)::integer + 
    (payment_account_id IS NOT NULL)::integer = 1
  ),
  
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Multi-column uniqueness
CREATE UNIQUE INDEX uq_orders_idempotency 
  ON orders(customer_id, idempotency_key)
  WHERE idempotency_key IS NOT NULL;

Audit Logging Pattern

-- Generic audit log — works for any table
CREATE TABLE audit_logs (
  id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  table_name  TEXT NOT NULL,
  record_id   UUID NOT NULL,
  operation   TEXT NOT NULL CHECK (operation IN ('INSERT', 'UPDATE', 'DELETE')),
  old_data    JSONB,
  new_data    JSONB,
  changed_by  UUID REFERENCES users(id),
  ip_address  INET,
  user_agent  TEXT,
  occurred_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_audit_table_record ON audit_logs(table_name, record_id, occurred_at DESC);
CREATE INDEX idx_audit_user ON audit_logs(changed_by, occurred_at DESC);

-- PostgreSQL trigger to auto-populate audit log
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
  INSERT INTO audit_logs (table_name, record_id, operation, old_data, new_data)
  VALUES (
    TG_TABLE_NAME,
    COALESCE(NEW.id, OLD.id),
    TG_OP,
    CASE WHEN TG_OP = 'INSERT' THEN NULL ELSE row_to_json(OLD)::JSONB END,
    CASE WHEN TG_OP = 'DELETE' THEN NULL ELSE row_to_json(NEW)::JSONB END
  );
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Apply to a table
CREATE TRIGGER orders_audit
  AFTER INSERT OR UPDATE OR DELETE ON orders
  FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();

Soft Delete Pattern

-- Add to any table that needs soft delete
deleted_at TIMESTAMPTZ;  -- NULL = active, timestamp = deleted

-- Partial index makes queries on active records efficient
CREATE INDEX idx_orders_active ON orders(customer_id, created_at)
  WHERE deleted_at IS NULL;

-- PostgreSQL Row Security Policy (enforce soft delete at DB level)
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY orders_hide_deleted ON orders
  USING (deleted_at IS NULL);

-- Application-level: always filter
SELECT * FROM orders WHERE customer_id = ? AND deleted_at IS NULL;

-- "Delete"
UPDATE orders SET deleted_at = now() WHERE id = ?;

Avoiding Common Anti-Patterns

EAV (Entity-Attribute-Value) Anti-pattern

-- ANTI-PATTERN: EAV — looks flexible, is a performance disaster
CREATE TABLE entity_attributes (
  entity_id  UUID,
  attr_name  TEXT,
  attr_value TEXT  -- All values as strings = no type safety
);

-- BETTER: Use JSONB for truly dynamic attributes
CREATE TABLE products (
  id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name       TEXT NOT NULL,
  attributes JSONB NOT NULL DEFAULT '{}'  -- {"color": "red", "size": "M", "weight_g": 150}
);
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);
-- Query: WHERE attributes->>'color' = 'red'
-- Query: WHERE (attributes->>'weight_g')::int > 100

Polymorphic Associations Anti-pattern

-- ANTI-PATTERN: polymorphic (can't have FK constraint)
CREATE TABLE comments (
  commentable_type TEXT,   -- 'Post', 'Video', 'Article'
  commentable_id   UUID,   -- No FK possible — DB can't enforce integrity
  ...
);

-- BETTER: separate tables (FK integrity preserved)
CREATE TABLE post_comments    (post_id    UUID REFERENCES posts(id), ...);
CREATE TABLE video_comments   (video_id   UUID REFERENCES videos(id), ...);
CREATE TABLE article_comments (article_id UUID REFERENCES articles(id), ...);

-- Or: null all but one FK (with check constraint)
CREATE TABLE comments (
  post_id    UUID REFERENCES posts(id),
  video_id   UUID REFERENCES videos(id),
  article_id UUID REFERENCES articles(id),
  CHECK (
    (post_id IS NOT NULL)::int +
    (video_id IS NOT NULL)::int +
    (article_id IS NOT NULL)::int = 1
  ),
  ...
);

Storing Delimited Values

-- ANTI-PATTERN: comma-separated values in a column
CREATE TABLE users (
  tags TEXT  -- 'developer,admin,beta-user' — cannot index, cannot FK, hard to query
);

-- BETTER: array column (PostgreSQL)
CREATE TABLE users (
  tags TEXT[] NOT NULL DEFAULT '{}'
);
CREATE INDEX idx_users_tags ON users USING GIN(tags);
-- Query: WHERE 'developer' = ANY(tags)

-- OR: proper junction table (best for entity references)
CREATE TABLE user_tags ( user_id UUID, tag_id UUID, PRIMARY KEY (user_id, tag_id) );

Deeper Reference

For normalization decision guides and advanced schema patterns, see:

references/normalization-guide.md — normal forms with examples, audit trail patterns, soft-delete strategies, and multi-tenancy schema designs
references/advanced-patterns.md — time-series schema design, graph data in PostgreSQL, polymorphic associations, and event sourcing table layouts

Adoption

harsh040506/schema-design

$ install --global

Security Scan Results

SKILL.md

Database Schema Design

Foundational Principles

Table Naming and Column Conventions

Column Type Selection

Relationship Patterns

One-to-Many

Many-to-Many

Self-Referential (Tree / Hierarchy)

Index Design

When NOT to Add an Index

Constraints

Audit Logging Pattern

Soft Delete Pattern

Avoiding Common Anti-Patterns

EAV (Entity-Attribute-Value) Anti-pattern

Polymorphic Associations Anti-pattern

Storing Delimited Values

Deeper Reference

Related Skills

harsh040506/single-cell-rna-qc

harsh040506/scvi-tools

harsh040506/scientific-problem-selection

harsh040506/nextflow-development

harsh040506/schema-design

$ install --global

Security Scan Results

SKILL.md

Database Schema Design

Foundational Principles

Table Naming and Column Conventions

Column Type Selection

Relationship Patterns

One-to-Many

Many-to-Many

Self-Referential (Tree / Hierarchy)

Index Design

When NOT to Add an Index

Constraints

Audit Logging Pattern

Soft Delete Pattern

Avoiding Common Anti-Patterns

EAV (Entity-Attribute-Value) Anti-pattern

Polymorphic Associations Anti-pattern

Storing Delimited Values

Deeper Reference

Related Skills

harsh040506/single-cell-rna-qc

harsh040506/scvi-tools

harsh040506/scientific-problem-selection

harsh040506/nextflow-development