.github/skills/tsh-sql-and-database-understanding/SKILL.md
SQL writing and database engineering patterns, standards, and procedures. Use for designing database schemas, writing performant SQL queries, normalisation strategies, indexing, joins optimisation, locking mechanics, transactions, query debugging with EXPLAIN, and ORM integration. Applies to PostgreSQL, MySQL, MariaDB, SQL Server, and Oracle. Covers ORM usage with TypeORM, Prisma, Doctrine, Eloquent, Entity Framework, Hibernate, and GORM.
npx skillsauth add thesoftwarehouse/copilot-collections tsh-sql-and-database-understandingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides standards, patterns, and procedures for database schema design, writing performant SQL, query debugging, and database engineering best practices. It is database-engine-agnostic with notes on engine-specific behavior where critical.
| Principle | Application |
|---|---|
| Data Integrity First | Enforce constraints at the database level (NOT NULL, UNIQUE, FK, CHECK). Never rely solely on application-level validation. |
| Least Privilege | Database users/roles should have only the permissions they need. Application connections should never use the superuser account. |
| Explicit Over Implicit | Always specify column lists in SELECT, INSERT, and JOIN clauses. Avoid SELECT * in production code. |
| Measure Before Optimising | Use EXPLAIN ANALYZE to identify actual bottlenecks before adding indexes or restructuring queries. |
| Schema as Code | All schema changes go through versioned migration files. Never modify production schemas manually. |
| Element | Convention | Example |
|---|---|---|
| Tables | snake_case, plural nouns | order_items, user_addresses |
| Columns | snake_case | first_name, created_at |
| Primary keys | id (preferred) or <table_singular>_id | id, user_id |
| Foreign keys | <referenced_table_singular>_id | user_id, order_id |
| Indexes | idx_<table>_<columns> | idx_orders_user_id_status |
| Unique constraints | uq_<table>_<columns> | uq_users_email |
| Check constraints | chk_<table>_<description> | chk_orders_total_positive |
| Junction/pivot tables | <table1>_<table2> (alphabetical) | products_tags, roles_users |
Every table MUST include:
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(), -- or BIGSERIAL for auto-increment
-- ... domain columns ...
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
When soft deletes are required by business rules:
deleted_at TIMESTAMP WITH TIME ZONE DEFAULT NULL
| Strategy | When to Use | Trade-offs |
|---|---|---|
| UUID v4 / UUID v7 | Distributed systems, microservices, public-facing IDs | Larger storage, random UUIDs cause index fragmentation (prefer UUID v7 for ordered inserts) |
| ULID | When you need sortable unique IDs with good index locality | Not natively supported in all databases |
| BIGSERIAL / IDENTITY | Single-database monoliths, internal IDs | Sequential, predictable (security concern if exposed), not portable across DB instances |
Rule: Never expose auto-increment IDs in public APIs. Use UUIDs or ULIDs for external identifiers.
| Need | Use | Avoid |
|---|---|---|
| Monetary values | NUMERIC(19,4) or DECIMAL(19,4) | FLOAT, DOUBLE (precision loss) |
| Timestamps | TIMESTAMP WITH TIME ZONE | TIMESTAMP without timezone |
| Boolean flags | BOOLEAN | TINYINT, CHAR(1) |
| Short text (name, email) | VARCHAR(n) with appropriate limit | Unbounded TEXT for structured fields |
| Long text (descriptions) | TEXT | VARCHAR(10000) |
| Enums | VARCHAR with CHECK constraint or native ENUM | Magic integers |
| JSON/semi-structured | JSONB (PostgreSQL) or JSON | Storing relational data as JSON |
| IP addresses | INET (PostgreSQL) or VARCHAR(45) | VARCHAR(15) (IPv6 won't fit) |
Prefer CHECK constraints over native ENUM types for portability and ease of migration:
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CONSTRAINT chk_orders_status CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
Adding a new enum value is a simple ALTER TABLE ... DROP CONSTRAINT + ADD CONSTRAINT, no table rewrite required.
Schema design checklist:
- [ ] Every table has a primary key (UUID or BIGSERIAL)
- [ ] Every table has created_at and updated_at columns
- [ ] All foreign keys have corresponding indexes
- [ ] All columns have appropriate NOT NULL constraints
- [ ] Unique constraints are defined where business rules require uniqueness
- [ ] CHECK constraints enforce valid value ranges and enums
- [ ] Naming follows snake_case conventions consistently
- [ ] Data types are chosen precisely (no FLOAT for money, no TEXT for emails)
- [ ] Soft delete (deleted_at) is used only where business rules require it
- [ ] No business logic is embedded in database triggers or stored procedures (keep logic in the application layer)
| Normal Form | Rule | Violation Example | Fix |
|---|---|---|---|
| 1NF | Every column holds atomic (indivisible) values. No repeating groups. | tags: "php,sql,go" in a single column | Create a separate tags table with a junction table |
| 2NF | 1NF + every non-key column depends on the entire primary key (relevant for composite keys) | order_items(order_id, product_id, product_name) — product_name depends only on product_id | Move product_name to the products table |
| 3NF | 2NF + no transitive dependencies (non-key column depends on another non-key column) | employees(id, department_id, department_name) — department_name depends on department_id, not on id | Move department_name to a departments table |
| BCNF | 3NF + every determinant is a candidate key | Rare in practice; address when composite keys create functional dependency issues | Decompose the table so every determinant is a key |
Target 3NF by default. This eliminates redundancy while keeping the schema manageable.
Use 2NF only as an intermediate step when refactoring legacy schemas — never as a design target.
Use BCNF when you have composite primary keys with overlapping candidate keys (uncommon in application databases).
Unnormalised (0NF):
orders:
| order_id | customer_name | customer_email | items |
|----------|---------------|---------------------|--------------------------------|
| 1 | John Doe | [email protected] | Widget x2, Gadget x1 |
1NF — Atomic values, no repeating groups:
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL,
customer_email VARCHAR(255) NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id),
product_name VARCHAR(100) NOT NULL,
quantity INT NOT NULL CHECK (quantity > 0)
);
2NF — Remove partial dependencies (product details depend on product, not order):
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price NUMERIC(19,4) NOT NULL
);
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id),
product_id BIGINT NOT NULL REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0),
unit_price NUMERIC(19,4) NOT NULL -- snapshot of price at order time
);
3NF — Remove transitive dependencies (customer depends on customer, not order):
CREATE TABLE customers (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
Denormalise only when you have measured performance data (via EXPLAIN ANALYZE) proving that joins are the bottleneck. Common valid cases:
| Pattern | When to Use | Example |
|---|---|---|
| Materialised/cached columns | Frequently read aggregate that is expensive to compute on every query | orders.total_amount computed from order_items and stored on the order row |
| Snapshot columns | Values that must be preserved at a point in time even if the source changes | order_items.unit_price (price at time of purchase, not current product price) |
| Read-optimised views | Reporting or analytics queries that span many tables | Materialised views refreshed periodically |
| Search/filter columns | Columns derived from related tables used heavily in WHERE clauses | orders.customer_country duplicated from customers.country for filtering |
Rules for denormalisation:
One-to-Many (most common):
-- A customer has many orders
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id) ON DELETE RESTRICT,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Many-to-Many (junction table):
-- Products can have many tags, tags can belong to many products
CREATE TABLE products_tags (
product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE CASCADE,
tag_id BIGINT NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (product_id, tag_id)
);
CREATE INDEX idx_products_tags_tag_id ON products_tags(tag_id);
One-to-One:
-- Each user has exactly one profile
CREATE TABLE user_profiles (
user_id BIGINT PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
bio TEXT,
avatar_url VARCHAR(500)
);
| Action | Meaning | When to Use |
|---|---|---|
| RESTRICT (default) | Prevent deletion if referenced rows exist | Default for most business entities (prevent accidental data loss) |
| CASCADE | Delete/update child rows when parent is deleted/updated | Junction tables, dependent child records that have no meaning without parent |
| SET NULL | Set FK to NULL when parent is deleted | Optional relationships where the child can exist independently |
| SET DEFAULT | Set FK to its default value | Rare — use when a fallback reference makes business sense |
| NO ACTION | Same as RESTRICT but checked at end of transaction | When deferred constraint checking is needed |
Rule: Default to RESTRICT. Use CASCADE only on junction tables and tightly coupled child tables. Always explicitly specify the action — never rely on implicit defaults.
-- Categories with parent-child hierarchy
CREATE TABLE categories (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id BIGINT REFERENCES categories(id) ON DELETE SET NULL
);
CREATE INDEX idx_categories_parent_id ON categories(parent_id);
For deep hierarchies (trees), consider:
path VARCHAR(500) storing /1/5/12/ — good for read-heavy trees| Type | Syntax (PostgreSQL) | Use Case |
|---|---|---|
| B-tree (default) | CREATE INDEX idx ON table(column) | Equality and range queries (=, <, >, BETWEEN, ORDER BY) |
| Hash | CREATE INDEX idx ON table USING hash(column) | Equality-only lookups (rare — B-tree covers this equally well) |
| GIN | CREATE INDEX idx ON table USING gin(column) | Full-text search, JSONB containment, array operations |
| GiST | CREATE INDEX idx ON table USING gist(column) | Geometric/spatial data, range types, nearest-neighbor queries |
| BRIN | CREATE INDEX idx ON table USING brin(column) | Very large tables with naturally ordered data (timestamps, sequential IDs) |
Always index:
WHERE clausesORDER BY on large tablesJOIN conditionsConsider composite indexes for:
Avoid over-indexing:
INSERT, UPDATE, and DELETE operationsThe order of columns in a composite index matters. Place columns in this priority:
WHERE status = 'active')WHERE created_at > '2025-01-01')ORDER BY created_at DESC)-- Query: WHERE status = 'active' AND created_at > '2025-01-01' ORDER BY created_at DESC
-- Optimal index:
CREATE INDEX idx_orders_status_created_at ON orders(status, created_at DESC);
Index only the rows you actually query:
-- Only index active orders (if most queries filter for active)
CREATE INDEX idx_orders_active ON orders(customer_id, created_at)
WHERE deleted_at IS NULL;
Use unique indexes to enforce business constraints:
-- Email must be unique, but only for non-deleted users
CREATE UNIQUE INDEX uq_users_email_active ON users(email)
WHERE deleted_at IS NULL;
Periodically check for unused indexes:
-- PostgreSQL: find unused indexes
SELECT
schemaname, relname AS table_name, indexrelname AS index_name,
idx_scan AS times_used, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Standard CREATE INDEX and DROP INDEX acquire locks that block writes on the table for the duration of the operation. On large or heavily-used tables this can cause downtime.
Always use CONCURRENTLY when creating or dropping indexes on tables that serve live traffic:
-- Creating an index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders(customer_id);
-- Dropping an index without blocking writes
DROP INDEX CONCURRENTLY idx_orders_customer_id;
Key constraints and caveats:
| Aspect | Detail |
|---|---|
| Database support | CONCURRENTLY is PostgreSQL-specific. Other engines have their own online indexing alternatives: MySQL/MariaDB use ALGORITHM=INPLACE, LOCK=NONE; SQL Server uses WITH (ONLINE = ON); Oracle uses ONLINE keyword. Consult engine-specific docs for constraints and limitations |
| Transaction blocks | CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY cannot run inside a transaction block — avoid wrapping them in BEGIN/COMMIT |
| Migration frameworks | Most migration tools wrap statements in a transaction by default. Disable this for concurrent index operations (e.g., disable_ddl_transaction! in Rails, atomic = False in Django, separate migration step in Flyway) |
| Failed builds | If CREATE INDEX CONCURRENTLY fails, it leaves behind an invalid index. Check with \d table_name and drop the invalid index before retrying |
| Unique indexes | CREATE UNIQUE INDEX CONCURRENTLY performs an extra table scan — takes longer but still avoids blocking writes |
| Build time | Concurrent builds are slower than regular ones because they require multiple table passes |
When to skip CONCURRENTLY:
| Rule | Do | Don't |
|---|---|---|
| Specify columns | SELECT id, name, email FROM users | SELECT * FROM users |
| Use parameterised queries | WHERE id = $1 / WHERE id = ? | WHERE id = ' + userId + ' (SQL injection risk) |
| Limit results | LIMIT 100 or paginate | Unbounded SELECT on large tables |
| Filter early | WHERE clause narrows rows before joins | Joining full tables then filtering |
| Use EXISTS over IN for subqueries | WHERE EXISTS (SELECT 1 FROM ...) | WHERE id IN (SELECT id FROM ...) on large sets |
| Avoid functions on indexed columns | WHERE created_at >= '2025-01-01' | WHERE DATE(created_at) = '2025-01-01' (kills index) |
| Use UNION ALL over UNION | UNION ALL when duplicates are acceptable | UNION (forces sort + dedup) |
| Join Type | Returns | Use When |
|---|---|---|
| INNER JOIN | Only matching rows from both tables | You need data that exists in both tables |
| LEFT JOIN | All rows from left table + matches from right (NULL if no match) | You need all records from the primary table regardless of match |
| RIGHT JOIN | All rows from right table + matches from left | Rarely used — rewrite as LEFT JOIN for readability |
| CROSS JOIN | Cartesian product of both tables | Generating combinations (e.g., all products × all regions) |
| LATERAL JOIN | Correlated subquery for each row | Top-N per group, complex per-row calculations |
-- GOOD: Join on indexed columns, filter early
SELECT o.id, o.created_at, c.name
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'active'
AND o.created_at >= '2025-01-01'
ORDER BY o.created_at DESC
LIMIT 50;
-- BAD: Joining on expressions, filtering late
SELECT *
FROM orders o
INNER JOIN customers c ON LOWER(c.email) = LOWER(o.customer_email)
WHERE YEAR(o.created_at) = 2025;
Rules:
WHERE conditions on the driving table to reduce the row set early.Offset-based pagination (simple but degrades on large offsets):
SELECT id, name, email
FROM users
WHERE deleted_at IS NULL
ORDER BY created_at DESC
LIMIT 20 OFFSET 1000;
-- Database must scan and discard 1000 rows
Keyset/cursor-based pagination (performant at any depth):
-- First page
SELECT id, name, email, created_at
FROM users
WHERE deleted_at IS NULL
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Next page (use last row's values as cursor)
SELECT id, name, email, created_at
FROM users
WHERE deleted_at IS NULL
AND (created_at, id) < ('2025-02-01T10:00:00Z', '550e8400-e29b-41d4-a716-446655440000')
ORDER BY created_at DESC, id DESC
LIMIT 20;
Rule: Use offset-based pagination for admin/back-office UIs with moderate data volumes (< 100k rows). Use keyset pagination for APIs, infinite scrolling, and large datasets.
-- GOOD: Filter before aggregating
SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent
FROM orders
WHERE status = 'completed'
AND created_at >= '2025-01-01'
GROUP BY customer_id
HAVING COUNT(*) > 5
ORDER BY total_spent DESC;
-- BAD: Aggregating everything then filtering in application code
SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent
FROM orders
GROUP BY customer_id;
-- Then filtering in PHP/Node/Java... wasted database work
Use CTEs for readability. Be aware that in some databases (PostgreSQL < 12), CTEs act as optimisation fences:
-- Readable multi-step query
WITH active_customers AS (
SELECT id, name
FROM customers
WHERE status = 'active'
AND deleted_at IS NULL
),
recent_orders AS (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY customer_id
)
SELECT ac.id, ac.name, COALESCE(ro.order_count, 0) AS recent_orders
FROM active_customers ac
LEFT JOIN recent_orders ro ON ro.customer_id = ac.id
ORDER BY recent_orders DESC;
Use window functions for ranking, running totals, and per-group calculations without collapsing rows:
-- Rank orders by total within each customer
SELECT
customer_id,
id AS order_id,
total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY total DESC) AS rank,
SUM(total) OVER (PARTITION BY customer_id) AS customer_total
FROM orders
WHERE status = 'completed';
For large data modifications, batch to avoid long locks and transaction log bloat:
-- Instead of one massive DELETE:
-- DELETE FROM logs WHERE created_at < '2024-01-01'; -- locks millions of rows
-- Batch delete:
DO $$
DECLARE
rows_deleted INT;
BEGIN
LOOP
DELETE FROM logs
WHERE id IN (
SELECT id FROM logs
WHERE created_at < '2024-01-01'
LIMIT 5000
);
GET DIAGNOSTICS rows_deleted = ROW_COUNT;
EXIT WHEN rows_deleted = 0;
COMMIT;
END LOOP;
END $$;
| Property | Meaning | Practical Impact | |---|---|---| | Atomicity | All operations in a transaction succeed or all are rolled back | Partial updates never reach the database | | Consistency | Transaction moves the database from one valid state to another | Constraints are enforced at commit time | | Isolation | Concurrent transactions don't interfere with each other | Determined by the isolation level | | Durability | Committed data survives system failures | Data is persisted to disk on commit |
| Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads | Use Case |
|---|---|---|---|---|
| READ UNCOMMITTED | Yes | Yes | Yes | Almost never — only for dirty analytics on non-critical data |
| READ COMMITTED | No | Yes | Yes | Default for PostgreSQL. Good for most OLTP workloads |
| REPEATABLE READ | No | No | Yes* | Financial calculations, inventory checks within a transaction |
| SERIALIZABLE | No | No | No | Critical financial operations, booking systems with strict consistency |
*PostgreSQL's REPEATABLE READ also prevents phantom reads (it uses snapshot isolation internally).
Rule: Use READ COMMITTED by default. Escalate to REPEATABLE READ or SERIALIZABLE only for specific operations that require stronger guarantees, and handle serialisation failures with retry logic.
-- GOOD: Short, focused transaction
BEGIN;
UPDATE accounts SET balance = balance - 100.00 WHERE id = 1;
UPDATE accounts SET balance = balance + 100.00 WHERE id = 2;
INSERT INTO transactions (from_id, to_id, amount) VALUES (1, 2, 100.00);
COMMIT;
Rules:
SAVEPOINT for partial rollbacks within a larger transaction when needed.BEGIN;
INSERT INTO orders (customer_id, total) VALUES (1, 250.00);
SAVEPOINT before_items;
INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 99, 1);
-- If this fails:
ROLLBACK TO SAVEPOINT before_items;
-- Order is still inserted, items are rolled back
COMMIT;
| Lock Type | SQL | Scope | Behaviour |
|---|---|---|---|
| Row-level shared (FOR SHARE) | SELECT ... FOR SHARE | Row | Other transactions can read but not modify the row |
| Row-level exclusive (FOR UPDATE) | SELECT ... FOR UPDATE | Row | Other transactions cannot read (with FOR UPDATE) or modify the row |
| Table-level | LOCK TABLE ... IN <mode> MODE | Table | Applies to the entire table — use sparingly |
| Advisory locks | pg_advisory_lock(key) | Application-defined | Application-level coordination, not tied to specific rows |
Optimistic locking assumes conflicts are rare. It checks for conflicts at write time using a version column:
-- Schema
ALTER TABLE products ADD COLUMN version INT NOT NULL DEFAULT 1;
-- Read: fetch current version
SELECT id, name, price, stock, version FROM products WHERE id = 42;
-- Application receives: version = 3
-- Update: include version check
UPDATE products
SET stock = stock - 1, version = version + 1, updated_at = NOW()
WHERE id = 42 AND version = 3;
-- If 0 rows affected → another transaction modified the row → handle conflict (retry or error)
When to use:
ORM support:
| ORM | Implementation |
|---|---|
| TypeORM | @VersionColumn() decorator |
| Prisma | Manual implementation via version field and conditional update |
| Doctrine (PHP) | @ORM\Version annotation on an integer or datetime column |
| Eloquent (Laravel) | Manual implementation or laravel-optimistic-locking package |
| Entity Framework (.NET) | [Timestamp] attribute or IsRowVersion() in Fluent API |
| Hibernate (Java) | @Version annotation |
| GORM (Go) | Built-in optimistic locking with gorm:"column:version" tag |
Pessimistic locking assumes conflicts are likely. It acquires locks at read time to prevent concurrent modification:
-- Lock the row immediately — other transactions block until this transaction commits
BEGIN;
SELECT * FROM inventory WHERE product_id = 42 FOR UPDATE;
-- ... perform business logic ...
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 42;
COMMIT;
Variants:
-- FOR UPDATE SKIP LOCKED — skip already-locked rows (useful for job queues)
SELECT id, payload
FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED;
-- FOR UPDATE NOWAIT — fail immediately if the row is locked (instead of blocking)
SELECT * FROM inventory WHERE product_id = 42 FOR UPDATE NOWAIT;
-- Throws an error immediately if the row is locked by another transaction
When to use:
SKIP LOCKED)| Factor | Optimistic | Pessimistic | |---|---|---| | Conflict frequency | Low (< 1% of operations) | High or unpredictable | | Read/write ratio | Read-heavy | Write-heavy on contested resources | | Transaction duration | Can be long (no locks held) | Must be short (locks block others) | | Failure handling | Retry on conflict | Block until lock is released | | Distributed systems | Preferred (no lock coordination needed) | Difficult (requires sticky sessions or distributed locks) | | User experience | May show "conflict, please retry" | May show loading/waiting if contention is high |
lock_timeout to avoid indefinite blocking.-- PostgreSQL: set a lock timeout
SET lock_timeout = '5s';
-- MySQL: set innodb_lock_wait_timeout per session
SET innodb_lock_wait_timeout = 5;
-- Show the query plan (does not execute the query)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Show the query plan AND execute the query (shows actual timings)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Full diagnostic output (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'active'
ORDER BY o.created_at DESC
LIMIT 20;
Key fields to examine:
| Field | What It Tells You | |---|---| | Seq Scan | Full table scan — often a sign of missing index | | Index Scan | Using an index to find rows — this is what you want | | Index Only Scan | Data served entirely from the index (covering index) — best case | | Bitmap Index Scan | Index used to build a bitmap, then table scanned — common for OR conditions / multiple indexes | | Nested Loop | For each row in outer table, scan inner table — efficient for small outer sets | | Hash Join | Build hash table from one side, probe with the other — efficient for large equijoins | | Merge Join | Both sides sorted, then merged — efficient when both inputs are pre-sorted | | Sort | Explicit sort operation — check if an index could eliminate this | | Rows (estimated) | Planner's estimate of rows processed at this step | | Actual Rows (ANALYZE only) | Real number of rows processed — compare with estimated to find planner misestimates | | Buffers (ANALYZE + BUFFERS) | Shared/local buffers hit (cache) vs read (disk) — high reads = cache miss, consider prewarming or more memory | | Cost | Arbitrary units comparing relative cost of plan steps — lower is better |
| EXPLAIN Symptom | Likely Cause | Fix |
|---|---|---|
| Seq Scan on large table | Missing index | Add appropriate index |
| Seq Scan despite index existing | Query uses a function on the indexed column | Rewrite query or create a functional index |
| Estimated rows ≠ actual rows (off by 10x+) | Stale statistics | Run ANALYZE table_name; |
| Sort with high cost | No index matching ORDER BY | Add index covering sort columns |
| Nested Loop with large inner table | Planner chose wrong join strategy | Verify indexes on join columns; consider increasing work_mem |
| Hash Join spilling to disk | work_mem too small for the hash table | Increase work_mem for the session or globally |
| High Buffers: read count | Data not in cache | Increase shared_buffers or ensure table fits in memory; check for bloated tables |
Query optimisation process:
1. [ ] Identify the slow query (application logs, pg_stat_statements, slow query log)
2. [ ] Run EXPLAIN ANALYZE on the query
3. [ ] Identify the most expensive node in the plan
4. [ ] Determine root cause (missing index, bad statistics, inefficient join, sort)
5. [ ] Apply ONE fix at a time
6. [ ] Re-run EXPLAIN ANALYZE and compare
7. [ ] Repeat until acceptable performance is reached
8. [ ] Verify the fix doesn't degrade other queries
-- MySQL uses a different EXPLAIN format
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Key MySQL EXPLAIN columns:
-- type: ALL (full scan) → index → range → ref → eq_ref → const → system (best to worst)
-- possible_keys: indexes the optimiser considered
-- key: index actually used
-- rows: estimated rows scanned
-- Extra: "Using filesort", "Using temporary" are red flags
-- MySQL 8.0+ supports EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- GOOD: Readable, consistent formatting
SELECT
u.id,
u.email,
u.first_name,
u.last_name,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id
AND o.status = 'completed'
WHERE u.deleted_at IS NULL
AND u.created_at >= '2025-01-01'
GROUP BY u.id, u.email, u.first_name, u.last_name
HAVING COUNT(o.id) > 0
ORDER BY lifetime_value DESC
LIMIT 50;
Formatting rules:
SELECT, FROM, WHERE, JOIN, ORDER BY, GROUP BYSELECT list for queries with more than 3 columnsAND/OR at the beginning of the line, indentedu for users, o for orders)JOIN ... ON conditions consistently;Never concatenate user input into SQL strings. Always use parameterised queries:
-- GOOD: Parameterised (prepared statement)
-- PostgreSQL / Node.js
SELECT * FROM users WHERE email = $1;
-- MySQL / PHP
SELECT * FROM users WHERE email = ?;
ORM equivalents (all safe by default):
// TypeORM — safe
const user = await userRepository.findOne({ where: { email: userInput } });
// Prisma — safe
const user = await prisma.user.findUnique({ where: { email: userInput } });
// Doctrine — safe
$user = $repository->findOneBy(['email' => $userInput]);
// Eloquent — safe
$user = User::where('email', $userInput)->first();
// Entity Framework — safe
var user = await context.Users.FirstOrDefaultAsync(u => u.Email == userInput);
// Hibernate — safe
User user = session.createQuery("FROM User u WHERE u.email = :email", User.class)
.setParameter("email", userInput)
.uniqueResult();
// GORM — safe
var user User
db.Where("email = ?", userInput).First(&user)
DANGER: Raw queries with string concatenation:
// TypeORM — DANGEROUS if not parameterised
const users = await dataSource.query(`SELECT * FROM users WHERE email = '${userInput}'`); // SQL INJECTION!
// TypeORM — SAFE raw query
const users = await dataSource.query('SELECT * FROM users WHERE email = $1', [userInput]);
-- WRONG: = NULL never matches
SELECT * FROM users WHERE deleted_at = NULL;
-- CORRECT: Use IS NULL / IS NOT NULL
SELECT * FROM users WHERE deleted_at IS NULL;
-- Use COALESCE for default values
SELECT COALESCE(u.nickname, u.first_name, 'Anonymous') AS display_name
FROM users u;
-- NULL-safe comparison (when comparing two nullable columns)
-- PostgreSQL:
SELECT * FROM t1 JOIN t2 ON t1.val IS NOT DISTINCT FROM t2.val;
-- MySQL:
SELECT * FROM t1 JOIN t2 ON t1.val <=> t2.val;
-- PostgreSQL: INSERT ... ON CONFLICT
INSERT INTO products (sku, name, price, updated_at)
VALUES ('WIDGET-001', 'Widget', 19.99, NOW())
ON CONFLICT (sku)
DO UPDATE SET
name = EXCLUDED.name,
price = EXCLUDED.price,
updated_at = NOW();
-- MySQL: INSERT ... ON DUPLICATE KEY UPDATE
INSERT INTO products (sku, name, price, updated_at)
VALUES ('WIDGET-001', 'Widget', 19.99, NOW())
ON DUPLICATE KEY UPDATE
name = VALUES(name),
price = VALUES(price),
updated_at = NOW();
| Anti-Pattern | Problem | Fix |
|---|---|---|
| SELECT * | Fetches unnecessary columns, breaks when schema changes | Explicitly list needed columns |
| N+1 queries | 1 query to fetch parents + N queries for children | Use JOIN or batch loading (WHERE id IN (...)) |
| WHERE column != value with index | != / <> cannot use index efficiently | Restructure as positive condition or check if full scan is acceptable |
| LIKE '%value%' | Leading wildcard prevents index use | Use full-text search (tsvector in PostgreSQL, FULLTEXT in MySQL) |
| ORDER BY RAND() | Scans and sorts entire table | Use a sampled approach: WHERE id >= (SELECT FLOOR(RANDOM() * MAX(id)) FROM table) LIMIT 1 |
| Implicit type conversion | WHERE varchar_col = 123 may skip index | Match types explicitly: WHERE varchar_col = '123' |
| Correlated subquery in SELECT | Executes subquery for every row | Rewrite as JOIN or window function |
| Missing LIMIT on exploratory queries | Accidentally fetching millions of rows | Always add LIMIT when exploring data |
| Stack | Primary ORM | Alternative | |---|---|---| | PHP (Symfony) | Doctrine ORM | — | | PHP (Laravel) | Eloquent ORM | — | | Node.js (NestJS) | TypeORM or MikroORM | Prisma | | Node.js (General) | Prisma | TypeORM, Drizzle | | .NET | Entity Framework Core | Dapper (micro-ORM for raw SQL) | | Java (Spring) | Hibernate / Spring Data JPA | jOOQ (for SQL-first approach) | | Go | GORM | sqlx (for raw SQL), Ent |
1. Always review generated SQL. Enable query logging in development to inspect what the ORM generates. Inefficient ORM usage can produce catastrophic queries.
// TypeORM: enable query logging
const dataSource = new DataSource({
logging: ['query', 'error'],
// ...
});
// Prisma: enable query logging
const prisma = new PrismaClient({
log: ['query', 'warn', 'error'],
});
// Doctrine: enable SQL logger
$configuration->setSQLLogger(new \Doctrine\DBAL\Logging\EchoSQLLogger());
// Laravel: enable query log
DB::enableQueryLog();
// ... run queries ...
dd(DB::getQueryLog());
2. Solve N+1 queries with eager loading.
// TypeORM — eager loading with relations
const orders = await orderRepository.find({
relations: ['customer', 'items', 'items.product'],
where: { status: 'active' },
});
// Prisma — include related data
const orders = await prisma.order.findMany({
where: { status: 'active' },
include: {
customer: true,
items: { include: { product: true } },
},
});
// Eloquent — eager loading
$orders = Order::with(['customer', 'items.product'])
->where('status', 'active')
->get();
// Doctrine — DQL with joins
$orders = $em->createQuery('
SELECT o, c, i, p
FROM App\Entity\Order o
JOIN o.customer c
JOIN o.items i
JOIN i.product p
WHERE o.status = :status
')->setParameter('status', 'active')
->getResult();
// Entity Framework — eager loading
var orders = await context.Orders
.Include(o => o.Customer)
.Include(o => o.Items)
.ThenInclude(i => i.Product)
.Where(o => o.Status == "active")
.ToListAsync();
// Hibernate / Spring Data JPA — fetch join
@Query("SELECT o FROM Order o " +
"JOIN FETCH o.customer " +
"JOIN FETCH o.items i " +
"JOIN FETCH i.product " +
"WHERE o.status = :status")
List<Order> findActiveWithDetails(@Param("status") String status);
// GORM — preload
var orders []Order
db.Preload("Customer").Preload("Items.Product").
Where("status = ?", "active").
Find(&orders)
3. Use raw SQL for complex queries. When ORM abstractions produce inefficient SQL or cannot express the query you need, drop to raw SQL:
// TypeORM — raw query
const results = await dataSource.query(`
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id AND o.status = 'completed'
WHERE u.deleted_at IS NULL
GROUP BY u.id, u.email
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT $1
`, [50]);
// Prisma — raw query
const results = await prisma.$queryRaw`
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id AND o.status = 'completed'
WHERE u.deleted_at IS NULL
GROUP BY u.id, u.email
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT ${50}
`;
4. ORM Migration Workflow.
5. Repository Pattern with ORMs. Keep all database queries in repository classes. Services should never directly use the ORM's query builder or entity manager:
Service → Repository Interface → Repository Implementation (uses ORM)
This keeps business logic decoupled from the data access mechanism and makes testing easier (mock the repository interface).
-- Update table statistics for the planner
ANALYZE orders;
-- Update statistics for all tables
ANALYZE;
-- Check for table bloat (dead tuples)
SELECT relname, n_dead_tup, n_live_tup,
ROUND(n_dead_tup::NUMERIC / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
-- PostgreSQL: enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slowest queries by total time
SELECT
calls,
ROUND(total_exec_time::NUMERIC, 2) AS total_ms,
ROUND(mean_exec_time::NUMERIC, 2) AS mean_ms,
ROUND(max_exec_time::NUMERIC, 2) AS max_ms,
LEFT(query, 200) AS query_preview
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- MySQL: enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- queries taking > 1 second
pool_size = (core_count * 2) + effective_spindle_count (formula from PostgreSQL wiki).max_connections appropriately and alert when pool utilisation exceeds 80%.When writing SQL or designing a database schema, follow this workflow:
Implementation progress:
- [ ] Step 1: Understand the data requirements
- [ ] Step 2: Design the schema (entities, relationships, constraints)
- [ ] Step 3: Choose normalisation level and document any denormalisation decisions
- [ ] Step 4: Define indexes based on expected query patterns
- [ ] Step 5: Write the migration(s)
- [ ] Step 6: Write the queries (raw SQL or ORM)
- [ ] Step 7: Run EXPLAIN ANALYZE on critical queries
- [ ] Step 8: Implement appropriate locking and transaction strategy
- [ ] Step 9: Review ORM-generated SQL (if applicable)
- [ ] Step 10: Test with realistic data volumes
Step 1: Understand the data requirements Read the task description and acceptance criteria. Identify the entities, their attributes, and the relationships between them. Clarify cardinality (one-to-many, many-to-many).
Step 2: Design the schema Create tables following the naming conventions and standard columns defined in this skill. Define relationships with appropriate foreign keys and ON DELETE actions.
Step 3: Choose normalisation level Target 3NF by default. Document any intentional denormalisation with a clear justification tied to measured performance needs or snapshot requirements.
Step 4: Define indexes Identify the primary query patterns and create indexes to support them. Index all foreign keys. Create composite indexes for multi-column filters and sorts.
Step 5: Write the migration(s)
Generate migration files with both up and down methods. Review the generated DDL before applying.
Step 6: Write the queries
Write queries following the performance and formatting rules in this skill. Use parameterised queries. Avoid SELECT *.
Step 7: Run EXPLAIN ANALYZE Verify critical query plans. Look for sequential scans on large tables, missing indexes, and planner misestimates.
Step 8: Implement locking and transaction strategy Choose optimistic or pessimistic locking based on the contention profile. Define transaction boundaries and appropriate isolation levels.
Step 9: Review ORM-generated SQL Enable query logging and inspect generated queries for N+1 problems, unnecessary joins, or missing eager loading.
Step 10: Test with realistic data volumes Seed the development database with production-scale data. Verify that queries perform within acceptable time limits under load.
tsh-architecture-designing — for designing data models as part of broader system architecturetsh-code-reviewing — for validating SQL quality, index coverage, and query performancetsh-technical-context-discovering — for establishing database conventions and existing patterns before designingdevelopment
Custom hook and composable patterns — naming, composition, stable return shapes, lifecycle cleanup, and testing strategies. Use when writing reusable logic units (React hooks, Vue composables), refactoring logic into hooks, debugging hook behavior, or reviewing hook implementations.
testing
UI verification criteria, structure checklists, severity definitions, and tolerance rules for comparing implementations against Figma designs. Use for verifying UI matches design, understanding what to check, and determining acceptable differences.
development
Clean raw workshop or meeting transcripts from small talk, filler words, and off-topic tangents. Extract and structure business-relevant content into a standardized format with discussion topics, key decisions, action items, and open questions.
development
Discover and establish technical context before implementing any feature. Prioritize project instructions, existing codebase patterns, and external documentation in that order. Use for any task requiring understanding of project conventions, coding standards, architecture patterns, and established practices before writing code.