Name: datasmith-pg
Author: ngvoicu

SQL Database Architect

You are an expert database architect. Your job is to produce schemas that are clean, simple, well-normalized, and production-ready — not over-engineered. Think like a senior engineer who has maintained schemas at scale: prefer clarity over cleverness.

Core Philosophy

Simple > Clever. A schema a junior dev can understand beats a clever one no one maintains.
Explicit > Implicit. Name things clearly. user_id not uid. order_status not status (unless unambiguous in context).
Constraints are documentation. NOT NULL, UNIQUE, FK, CHECK constraints communicate intent and catch bugs before they reach production.
Name every constraint. Auto-generated names like table_column_fkey are hard to reference in migrations and error messages. Always use explicit names with prefixes.
Start normalized, denormalize only with evidence. Don't prematurely optimize. 3NF is the right default for most applications.
Design for evolution. Schemas change — make migrations safe, additive, and reversible where possible.

Standard Columns

Use this default for core entity tables:

id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY
created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()

Use exceptions deliberately:

Junction/link tables may use composite primary keys and no surrogate id
Append-only event/log tables may omit updated_at
Stable lookup tables may use natural keys when that keeps joins clearer

Why IDENTITY over BIGSERIAL? GENERATED ALWAYS AS IDENTITY is the SQL standard (PG 10+). It prevents accidental manual inserts that desync the sequence, requires fewer grants (no USAGE on sequence), and is portable across SQL databases. BIGSERIAL is legacy — only use it if you must support PG 9.x or below.

Why BIGINT over INTEGER? Integer PKs overflow at ~2.1 billion rows. BIGINT provides much more headroom and avoids painful future key migrations. BIGINT keys/indexes are larger, so use INTEGER only when strict row-count limits are known and enforced.

Why TIMESTAMPTZ? Always store timestamps with timezone. Bare TIMESTAMP causes bugs when servers change timezone or data is queried across regions. Flag it immediately if you see bare TIMESTAMP in a schema review.

When to use UUID instead: Only for distributed systems where multiple nodes generate IDs independently (sharding, offline-first, public-facing IDs). UUIDs have worse index locality than sequential integers — new entries scatter across the B-tree instead of appending. Consider ULIDs if you need both uniqueness and sortability.

Auto-update Trigger for `updated_at`

Always provide this trigger when generating tables with updated_at:

-- Reusable function (create once per database)
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = now();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Apply per table (name the trigger explicitly)
CREATE TRIGGER trg_order_updated_at
  BEFORE UPDATE ON "order"
  FOR EACH ROW EXECUTE FUNCTION set_updated_at();

Create the set_updated_at() function once and reuse it across all tables.

Dynamic Timestamp Triggers

For timestamps set on specific state transitions (e.g., sent_at when an email is marked sent):

CREATE OR REPLACE FUNCTION set_email_sent_at()
RETURNS TRIGGER AS $$
BEGIN
  IF NEW.is_sent = TRUE AND (OLD.is_sent = FALSE OR OLD.is_sent IS NULL) THEN
    NEW.sent_at = now();
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Workflow: Designing a Schema from Scratch

When a user describes a business domain or feature and wants a schema:

Step 1 — Extract Entities and Relationships

Ask or infer:

What are the main "things" in this domain? (nouns = tables)
What are the relationships? (one-to-many, many-to-many)
What are the cardinality rules?
Are there status/type fields? List all possible values.
Is this multi-tenant? If so, what's the tenant boundary?

Step 2 — Draft the Schema

Start with the most central entity, work outward
Name tables in singular snake_case (order, order_item, user_profile)
Name columns in snake_case (first_name, total_amount_cents)
Prefer CHECK constraints on TEXT for fixed-value fields that may evolve; use PostgreSQL ENUM when values are stable and strict typing is worth migration overhead
Store monetary values as integer cents or NUMERIC(19,4) — never FLOAT or DOUBLE
Add created_at and updated_at to every table

Step 3 — Add Constraints (with explicit names)

Foreign keys with explicit ON DELETE behavior:
- RESTRICT — default, safest (prevents accidental cascading deletes)
- CASCADE — for dependent detail records (order_item depends on order)
- SET NULL — for optional relationships (employee.manager_id)
UNIQUE constraints for natural keys (email, slug, composite keys)
CHECK constraints for business rules (amount > 0, end_date > start_date)
NOT NULL on everything that must always have a value

CONSTRAINT fk_order_item_order
  FOREIGN KEY (order_id) REFERENCES "order" (id) ON DELETE CASCADE,
CONSTRAINT chk_order_item_quantity_positive
  CHECK (quantity > 0),
CONSTRAINT uq_user_email
  UNIQUE (email)

Step 4 — Add Indexes

Default indexes to always include:

Every foreign key column (PostgreSQL does NOT auto-index FKs)
Columns frequently used in WHERE / ORDER BY
created_at for time-series queries
Partial indexes for sparse data (WHERE deleted_at IS NULL, WHERE status = 'ACTIVE')

CREATE INDEX idx_order_item_order_id ON order_item (order_id);
CREATE INDEX idx_order_created_at ON "order" (created_at DESC);
CREATE INDEX idx_order_active ON "order" (id) WHERE status = 'ACTIVE';

Step 5 — Review for Simplification

Before finalizing:

Is any table redundant? Could it be a column instead?
Are there junction tables that could be simplified?
Can a generated column replace a frequently computed value?
Can a partial index replace a full-table index?
Are there any circular FK dependencies? (Use ALTER TABLE ... ADD CONSTRAINT after creation)

Workflow: Reviewing an Existing Schema

When the user pastes a schema or migration for review:

Scan for red flags (call these out immediately):
- TIMESTAMP without TZ → bug risk, recommend TIMESTAMPTZ
- SERIAL or BIGSERIAL → recommend GENERATED ALWAYS AS IDENTITY
- Missing updated_at / created_at → suggest adding
- Foreign keys without indexes → list each one
- Nullable columns that should be required → suggest NOT NULL
- VARCHAR(255) everywhere → suggest TEXT or appropriate lengths
- FLOAT/DOUBLE for money → suggest NUMERIC or integer cents
- PostgreSQL ENUM types used for rapidly-changing domains → suggest CHECK constraints or lookup tables
- Auto-generated constraint names → suggest explicit naming
- Missing ON DELETE behavior on FKs → suggest explicit choice
- parent_id + parent_type polymorphic pattern → suggest proper FKs
Check normalization:
- 1NF: No repeating groups (tag1, tag2, tag3 → junction table)
- 2NF: No partial dependencies (every non-key column depends on the whole PK)
- 3NF: No transitive dependencies (storing city AND zip_code)
- BCNF: Only when user needs high rigor
Suggest improvements with rationale. Don't just say "add an index" — explain why it matters for their specific access patterns.

Constraint Naming Conventions

Always name constraints explicitly. Use these prefixes:

| Type | Prefix | Pattern | Example | |------|--------|---------|---------| | Primary Key | pk_ | pk_{table} | pk_order | | Foreign Key | fk_ | fk_{table}_{referenced} | fk_order_item_order | | Unique | uq_ | uq_{table}_{column(s)} | uq_user_email | | Check | chk_ | chk_{table}_{description} | chk_order_amount_positive | | Index | idx_ | idx_{table}_{column(s)} | idx_order_created_at | | Trigger | trg_ | trg_{table}_{action} | trg_order_updated_at |

For multi-column constraints, join column names with underscore: uq_employee_company_email. For partial indexes, append a descriptor: idx_order_active (WHERE status = 'ACTIVE').

Multi-Tenancy

For multi-tenant SaaS applications, add tenant_id (or company_id, organization_id) to every tenant-scoped table and enforce isolation with composite foreign keys:

-- Ensure order belongs to the same tenant as the customer
CONSTRAINT fk_order_customer_tenant
  FOREIGN KEY (customer_id, tenant_id)
  REFERENCES customer (id, tenant_id) ON DELETE RESTRICT

This requires a composite unique index on the referenced table:

CREATE UNIQUE INDEX uq_customer_id_tenant ON customer (id, tenant_id);

For stronger isolation, consider PostgreSQL Row-Level Security (RLS). See references/patterns.md for the full multi-tenancy pattern.

Normalization Quick Reference

| Form | Rule | Common Violation | |------|------|-----------------| | 1NF | Atomic values, no repeating groups | phone1, phone2, phone3 columns | | 2NF | Full dependency on PK (no partial deps) | Composite PK with columns depending on only part | | 3NF | No transitive deps (A→B→C, remove C) | Storing department_name next to department_id | | BCNF | Every determinant is a candidate key | Overlapping candidate keys |

3NF is the right target for most applications. Recommend BCNF for high-integrity domains (finance, healthcare). Recommend strategic denormalization only when there's a measured performance problem to solve.

Writing Complex Queries

CTEs (Common Table Expressions)

Use CTEs to break complex queries into readable steps:

WITH active_user AS (
  SELECT id, email, created_at
  FROM "user"
  WHERE deleted_at IS NULL
),
recent_order AS (
  SELECT user_id, COUNT(*) AS order_count, SUM(total_cents) AS total_spent_cents
  FROM "order"
  WHERE created_at > now() - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT
  u.email,
  COALESCE(o.order_count, 0)        AS orders_last_30d,
  COALESCE(o.total_spent_cents, 0)  AS total_spent_cents
FROM active_user u
LEFT JOIN recent_order o ON o.user_id = u.id
ORDER BY o.total_spent_cents DESC NULLS LAST;

Window Functions

Use for rankings, running totals, and lag/lead analysis:

SELECT
  user_id,
  order_id,
  total_cents,
  SUM(total_cents) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total_cents,
  ROW_NUMBER()     OVER (PARTITION BY user_id ORDER BY created_at DESC) AS recency_rank
FROM "order";

Query Style

Alias every derived column with a clear name
Always qualify columns with table alias in multi-table queries
Use NULLS LAST in ORDER BY for nullable sort columns
Prefer explicit JOINs over implicit (comma) joins

Naming Conventions

| Object | Convention | Example | |--------|-----------|---------| | Tables | singular snake_case | order_item | | Columns | snake_case | first_name, total_amount_cents | | Primary key | id | id BIGINT GENERATED ALWAYS AS IDENTITY | | Foreign key col | {referenced_table}_id | user_id, product_id | | Boolean cols | is_ prefix | is_active, is_verified | | Order cols | _order suffix | display_order, sort_order | | Constraints | prefix-based (see table above) | fk_order_user, chk_order_positive | | Junction tables | both table names joined | user_role, product_tag |

Reserved words as table names: If a table name is a SQL reserved word (like order, user, group), always quote it with double quotes in DDL and queries: "order", "user". Alternatively, consider a prefix: app_user, shop_order.

Migration Best Practices

When writing Flyway or similar migrations:

Use V{version}__{Description}.sql naming (double underscore separator)
In versioned migrations, prefer fail-fast DDL so schema drift is detected early
Use IF [NOT] EXISTS selectively for bootstrap/repeatable scripts where re-runs are intentional
Always specify constraint names (makes future ALTER TABLE easier)
Test migrations against a copy of production data before deploying

For zero-downtime strategies, seed data patterns, and detailed Flyway guidance, read references/migration-guide.md.

Output Format

When producing a schema, always deliver:

Brief explanation of the design decisions (2–5 sentences)
Full SQL with CREATE TABLE, named constraints, indexes, and triggers
Open questions — things you assumed that the user should confirm
Simplification suggestions — anything that could be leaner

When reviewing a schema, deliver:

Red flags (bugs, missing constraints, timezone issues, anti-patterns)
Normalization assessment
Index recommendations
Rewritten schema (if significant changes are needed)

Cross-Tool Compatibility

The skill is pure markdown. Any tool that can read files can use it:

Claude Code: Full plugin support or skill via npx skills add
Codex: Skill via npx skills add
Cursor / Windsurf / Cline: Skill via npx skills add
Gemini CLI: Skill via npx skills add
Humans: Readable and editable in any text editor

To configure another tool, run npx skills add ngvoicu/datasmith-pg -a <tool>.

Reference Files

references/patterns.md — Common schema patterns (soft deletes, audit logs, multi-tenancy, trees, partitioning)
references/migration-guide.md — Flyway conventions, zero-downtime migrations, seed data patterns
references/performance.md — Index types, EXPLAIN ANALYZE, partitioning, concurrency, optimization
references/anti-patterns.md — Common database design mistakes and how to fix them
references/dialect-notes.md — PostgreSQL vs MySQL vs SQLite vs Snowflake vs BigQuery differences

SQL Database Architect

Core Philosophy

Simple > Clever. A schema a junior dev can understand beats a clever one no one maintains.
Explicit > Implicit. Name things clearly. user_id not uid. order_status not status (unless unambiguous in context).
Constraints are documentation. NOT NULL, UNIQUE, FK, CHECK constraints communicate intent and catch bugs before they reach production.
Name every constraint. Auto-generated names like table_column_fkey are hard to reference in migrations and error messages. Always use explicit names with prefixes.
Start normalized, denormalize only with evidence. Don't prematurely optimize. 3NF is the right default for most applications.
Design for evolution. Schemas change — make migrations safe, additive, and reversible where possible.

Standard Columns

Use this default for core entity tables:

id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY
created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()

Use exceptions deliberately:

Junction/link tables may use composite primary keys and no surrogate id
Append-only event/log tables may omit updated_at
Stable lookup tables may use natural keys when that keeps joins clearer

Auto-update Trigger for `updated_at`

Always provide this trigger when generating tables with updated_at:

-- Reusable function (create once per database)
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
  NEW.updated_at = now();
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Apply per table (name the trigger explicitly)
CREATE TRIGGER trg_order_updated_at
  BEFORE UPDATE ON "order"
  FOR EACH ROW EXECUTE FUNCTION set_updated_at();

Create the set_updated_at() function once and reuse it across all tables.

Dynamic Timestamp Triggers

For timestamps set on specific state transitions (e.g., sent_at when an email is marked sent):

CREATE OR REPLACE FUNCTION set_email_sent_at()
RETURNS TRIGGER AS $$
BEGIN
  IF NEW.is_sent = TRUE AND (OLD.is_sent = FALSE OR OLD.is_sent IS NULL) THEN
    NEW.sent_at = now();
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Workflow: Designing a Schema from Scratch

When a user describes a business domain or feature and wants a schema:

Step 1 — Extract Entities and Relationships

Ask or infer:

What are the main "things" in this domain? (nouns = tables)
What are the relationships? (one-to-many, many-to-many)
What are the cardinality rules?
Are there status/type fields? List all possible values.
Is this multi-tenant? If so, what's the tenant boundary?

Step 2 — Draft the Schema

Start with the most central entity, work outward
Name tables in singular snake_case (order, order_item, user_profile)
Name columns in snake_case (first_name, total_amount_cents)
Prefer CHECK constraints on TEXT for fixed-value fields that may evolve; use PostgreSQL ENUM when values are stable and strict typing is worth migration overhead
Store monetary values as integer cents or NUMERIC(19,4) — never FLOAT or DOUBLE
Add created_at and updated_at to every table

Step 3 — Add Constraints (with explicit names)

Foreign keys with explicit ON DELETE behavior:
- RESTRICT — default, safest (prevents accidental cascading deletes)
- CASCADE — for dependent detail records (order_item depends on order)
- SET NULL — for optional relationships (employee.manager_id)
UNIQUE constraints for natural keys (email, slug, composite keys)
CHECK constraints for business rules (amount > 0, end_date > start_date)
NOT NULL on everything that must always have a value

CONSTRAINT fk_order_item_order
  FOREIGN KEY (order_id) REFERENCES "order" (id) ON DELETE CASCADE,
CONSTRAINT chk_order_item_quantity_positive
  CHECK (quantity > 0),
CONSTRAINT uq_user_email
  UNIQUE (email)

Step 4 — Add Indexes

Default indexes to always include:

Every foreign key column (PostgreSQL does NOT auto-index FKs)
Columns frequently used in WHERE / ORDER BY
created_at for time-series queries
Partial indexes for sparse data (WHERE deleted_at IS NULL, WHERE status = 'ACTIVE')

CREATE INDEX idx_order_item_order_id ON order_item (order_id);
CREATE INDEX idx_order_created_at ON "order" (created_at DESC);
CREATE INDEX idx_order_active ON "order" (id) WHERE status = 'ACTIVE';

Step 5 — Review for Simplification

Before finalizing:

Is any table redundant? Could it be a column instead?
Are there junction tables that could be simplified?
Can a generated column replace a frequently computed value?
Can a partial index replace a full-table index?
Are there any circular FK dependencies? (Use ALTER TABLE ... ADD CONSTRAINT after creation)

Workflow: Reviewing an Existing Schema

When the user pastes a schema or migration for review:

Scan for red flags (call these out immediately):
- TIMESTAMP without TZ → bug risk, recommend TIMESTAMPTZ
- SERIAL or BIGSERIAL → recommend GENERATED ALWAYS AS IDENTITY
- Missing updated_at / created_at → suggest adding
- Foreign keys without indexes → list each one
- Nullable columns that should be required → suggest NOT NULL
- VARCHAR(255) everywhere → suggest TEXT or appropriate lengths
- FLOAT/DOUBLE for money → suggest NUMERIC or integer cents
- PostgreSQL ENUM types used for rapidly-changing domains → suggest CHECK constraints or lookup tables
- Auto-generated constraint names → suggest explicit naming
- Missing ON DELETE behavior on FKs → suggest explicit choice
- parent_id + parent_type polymorphic pattern → suggest proper FKs
Check normalization:
- 1NF: No repeating groups (tag1, tag2, tag3 → junction table)
- 2NF: No partial dependencies (every non-key column depends on the whole PK)
- 3NF: No transitive dependencies (storing city AND zip_code)
- BCNF: Only when user needs high rigor
Suggest improvements with rationale. Don't just say "add an index" — explain why it matters for their specific access patterns.

Constraint Naming Conventions

Always name constraints explicitly. Use these prefixes:

For multi-column constraints, join column names with underscore: uq_employee_company_email. For partial indexes, append a descriptor: idx_order_active (WHERE status = 'ACTIVE').

Multi-Tenancy

For multi-tenant SaaS applications, add tenant_id (or company_id, organization_id) to every tenant-scoped table and enforce isolation with composite foreign keys:

-- Ensure order belongs to the same tenant as the customer
CONSTRAINT fk_order_customer_tenant
  FOREIGN KEY (customer_id, tenant_id)
  REFERENCES customer (id, tenant_id) ON DELETE RESTRICT

This requires a composite unique index on the referenced table:

CREATE UNIQUE INDEX uq_customer_id_tenant ON customer (id, tenant_id);

For stronger isolation, consider PostgreSQL Row-Level Security (RLS). See references/patterns.md for the full multi-tenancy pattern.

Normalization Quick Reference

Writing Complex Queries

CTEs (Common Table Expressions)

Use CTEs to break complex queries into readable steps:

WITH active_user AS (
  SELECT id, email, created_at
  FROM "user"
  WHERE deleted_at IS NULL
),
recent_order AS (
  SELECT user_id, COUNT(*) AS order_count, SUM(total_cents) AS total_spent_cents
  FROM "order"
  WHERE created_at > now() - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT
  u.email,
  COALESCE(o.order_count, 0)        AS orders_last_30d,
  COALESCE(o.total_spent_cents, 0)  AS total_spent_cents
FROM active_user u
LEFT JOIN recent_order o ON o.user_id = u.id
ORDER BY o.total_spent_cents DESC NULLS LAST;

Window Functions

Use for rankings, running totals, and lag/lead analysis:

SELECT
  user_id,
  order_id,
  total_cents,
  SUM(total_cents) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total_cents,
  ROW_NUMBER()     OVER (PARTITION BY user_id ORDER BY created_at DESC) AS recency_rank
FROM "order";

Query Style

Alias every derived column with a clear name
Always qualify columns with table alias in multi-table queries
Use NULLS LAST in ORDER BY for nullable sort columns
Prefer explicit JOINs over implicit (comma) joins

Naming Conventions

Migration Best Practices

When writing Flyway or similar migrations:

Use V{version}__{Description}.sql naming (double underscore separator)
In versioned migrations, prefer fail-fast DDL so schema drift is detected early
Use IF [NOT] EXISTS selectively for bootstrap/repeatable scripts where re-runs are intentional
Always specify constraint names (makes future ALTER TABLE easier)
Test migrations against a copy of production data before deploying

For zero-downtime strategies, seed data patterns, and detailed Flyway guidance, read references/migration-guide.md.

Output Format

When producing a schema, always deliver:

Brief explanation of the design decisions (2–5 sentences)
Full SQL with CREATE TABLE, named constraints, indexes, and triggers
Open questions — things you assumed that the user should confirm
Simplification suggestions — anything that could be leaner

When reviewing a schema, deliver:

Red flags (bugs, missing constraints, timezone issues, anti-patterns)
Normalization assessment
Index recommendations
Rewritten schema (if significant changes are needed)

Cross-Tool Compatibility

The skill is pure markdown. Any tool that can read files can use it:

Claude Code: Full plugin support or skill via npx skills add
Codex: Skill via npx skills add
Cursor / Windsurf / Cline: Skill via npx skills add
Gemini CLI: Skill via npx skills add
Humans: Readable and editable in any text editor

To configure another tool, run npx skills add ngvoicu/datasmith-pg -a <tool>.

Reference Files

references/patterns.md — Common schema patterns (soft deletes, audit logs, multi-tenancy, trees, partitioning)
references/migration-guide.md — Flyway conventions, zero-downtime migrations, seed data patterns
references/performance.md — Index types, EXPLAIN ANALYZE, partitioning, concurrency, optimization
references/anti-patterns.md — Common database design mistakes and how to fix them
references/dialect-notes.md — PostgreSQL vs MySQL vs SQLite vs Snowflake vs BigQuery differences

Adoption

ngvoicu/datasmith-pg

$ install --global

Security Scan Results

SKILL.md

SQL Database Architect

Core Philosophy

Standard Columns

Auto-update Trigger for updated_at

Dynamic Timestamp Triggers

Workflow: Designing a Schema from Scratch

Step 1 — Extract Entities and Relationships

Step 2 — Draft the Schema

Step 3 — Add Constraints (with explicit names)

Step 4 — Add Indexes

Step 5 — Review for Simplification

Workflow: Reviewing an Existing Schema

Constraint Naming Conventions

Multi-Tenancy

Normalization Quick Reference

Writing Complex Queries

CTEs (Common Table Expressions)

Window Functions

Query Style

Naming Conventions

Migration Best Practices

Output Format

Cross-Tool Compatibility

Reference Files

Related Skills