postgresql-performance/SKILL.md
PostgreSQL query performance and storage internals sourced from "Introduction to PostgreSQL for the Data Professional" (Booz & Fritchey). Covers EXPLAIN ANALYZE (reading node types, costs, buffers), all index types (B-tree, GIN, GiST, BRIN, partial, functional, covering, composite), MVCC internals, VACUUM/ANALYZE configuration, work_mem tuning, and slow query diagnosis. Companion to postgresql-fundamentals and postgresql-administration.
npx skillsauth add peterbamuhigire/skills-web-dev postgresql-performanceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
postgresql-performance or would be better handled by a more specific companion skill.SKILL.md first, then load only the referenced deep-dive files that are necessary for the task.Always use EXPLAIN (ANALYZE, BUFFERS) to see both estimated and actual costs.
-- Estimated only (safe, does not execute)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Actual execution (runs the query)
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 42;
-- Full output with format
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
-- Verbose (includes column-level output)
EXPLAIN (ANALYZE, BUFFERS, VERBOSE) SELECT ...;
Nested Loop (cost=0.43..16.49 rows=3 width=120)
(actual time=0.125..0.432 rows=3 loops=1)
Buffers: shared hit=12 read=2
-> Index Scan using orders_customer_idx on orders
(cost=0.43..8.45 rows=3 width=120)
(actual time=0.115..0.210 rows=3 loops=1)
Index Cond: (customer_id = 42)
| Field | Meaning |
|---|---|
| cost=start..total | Planner's estimated cost (arbitrary units) |
| rows=N | Planner's estimated row count |
| actual time=first..last | Real milliseconds to first/last row |
| actual rows=N | Real row count returned |
| loops=N | How many times this node ran |
| shared hit=N | Pages found in buffer cache |
| shared read=N | Pages read from disk |
Key signals:
rows estimate vs actual rows wildly different → stale statistics → run ANALYZEshared read high → disk I/O bound → add RAM or indexloops × actual time = total node time| Node | Meaning |
|---|---|
| Seq Scan | Full table scan — fine for small tables or large % of rows |
| Index Scan | Uses index, random I/O to fetch heap rows |
| Index Only Scan | Covers query entirely from index — no heap access |
| Bitmap Index Scan | Collects matching TIDs, then fetches heap in order |
| Nested Loop | For each outer row, probe inner — good for small sets |
| Hash Join | Build hash table, probe — good for larger sets |
| Merge Join | Sorted inputs — good when both sides already sorted |
| Sort | Explicit sort — check work_mem if spilling to disk |
| Hash | Hashing phase of Hash Join |
Sort (cost=...) (actual time=...) (actual rows=...) (loops=1)
Sort Key: created_at
Sort Method: external merge Disk: 48280kB ← spilling!
Fix: increase work_mem for the session or globally:
SET work_mem = '256MB'; -- session-level
-- or in postgresql.conf for all sessions
work_mem = 64MB
-- postgresql.conf or per-session
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 1000; -- log plans > 1s
SET auto_explain.log_analyze = on;
SET auto_explain.log_buffers = on;
Best for: equality (=), range (<, >, BETWEEN), ORDER BY, LIKE 'prefix%'.
-- Single column
CREATE INDEX orders_customer_idx ON orders (customer_id);
-- Composite (column order matters: put equality columns first)
CREATE INDEX orders_customer_status_idx ON orders (customer_id, status);
-- Covering index (avoid heap access entirely)
CREATE INDEX orders_covering_idx ON orders (customer_id)
INCLUDE (total, created_at);
-- Descending
CREATE INDEX events_created_desc ON events (created_at DESC);
-- UNIQUE
CREATE UNIQUE INDEX users_email_uniq ON users (lower(email));
Index only a subset of rows — smaller, faster, lower maintenance cost.
-- Only index pending orders (WHERE clause on index matches query)
CREATE INDEX orders_pending_idx ON orders (created_at)
WHERE status = 'pending';
-- Query must have matching WHERE to use index
SELECT * FROM orders WHERE status = 'pending' ORDER BY created_at;
-- Partial unique (email unique only for active accounts)
CREATE UNIQUE INDEX users_active_email ON users (email) WHERE deleted_at IS NULL;
Index the result of a function or expression.
-- Case-insensitive email lookup
CREATE INDEX users_lower_email ON users (lower(email));
-- Used by:
SELECT * FROM users WHERE lower(email) = lower('[email protected]');
-- Index computed value
CREATE INDEX orders_year_idx ON orders (EXTRACT(YEAR FROM created_at));
Best for: JSONB containment, array operators, full-text search, pg_trgm.
CREATE INDEX products_tags_gin ON products USING GIN (tags);
CREATE INDEX configs_payload_gin ON configs USING GIN (payload);
CREATE INDEX articles_search_gin ON articles USING GIN (search_vector);
CREATE INDEX users_email_trgm ON users USING GIN (email gin_trgm_ops);
Best for: geometric types, IP ranges (inet), range types, PostGIS geometry.
CREATE INDEX locations_geom ON locations USING GIST (geom);
CREATE INDEX bookings_period ON bookings USING GIST (daterange(start_date, end_date));
CREATE INDEX ip_log_addr ON access_log USING GIST (ip_address inet_ops);
Best for: very large tables with naturally ordered data (time-series, append-only logs). Tiny size, fast build, coarse filtering.
-- Only useful when physical order correlates with query column
CREATE INDEX events_created_brin ON events USING BRIN (created_at);
-- Check index usage (low idx_scan = unused index)
SELECT indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan;
-- Find missing indexes (sequential scans on large tables)
SELECT relname, seq_scan, seq_tup_read, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC;
-- Rebuild bloated index (online — no lock)
REINDEX INDEX CONCURRENTLY orders_customer_idx;
-- Rebuild all indexes on a table (online)
REINDEX TABLE CONCURRENTLY orders;
Automatically enabled. Reduces index size for columns with many duplicate values (e.g., status, country). No action needed.
PostgreSQL never overwrites rows in place. Every UPDATE writes a new row version (tuple); old versions remain until VACUUM removes them.
| Operation | What PostgreSQL Does |
|---|---|
| INSERT | Writes new tuple with xmin = current_txn |
| UPDATE | Marks old tuple as dead (xmax = current_txn), writes new tuple |
| DELETE | Marks tuple dead (xmax = current_txn) |
| SELECT | Sees only tuples where xmin ≤ snapshot and xmax not committed |
Benefits:
Cost:
Reclaims space from dead tuples. VACUUM marks space as reusable. VACUUM FULL rewrites the table (takes exclusive lock — avoid in production unless essential).
-- Standard VACUUM (online, no lock)
VACUUM orders;
-- With statistics update
VACUUM ANALYZE orders;
-- Verbose output
VACUUM VERBOSE ANALYZE orders;
-- Check autovacuum status per table
SELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze,
n_dead_tup, n_live_tup
FROM pg_stat_user_tables ORDER BY n_dead_tup DESC;
-- Check bloat (dead tuples as percentage)
SELECT relname,
n_dead_tup,
n_live_tup,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 0
ORDER BY dead_pct DESC;
# postgresql.conf — lower thresholds for high-churn tables
autovacuum_vacuum_scale_factor = 0.05 # 5% dead tuples (default 20%)
autovacuum_analyze_scale_factor = 0.02 # 2% new rows (default 10%)
autovacuum_vacuum_cost_delay = 2ms # less throttling
# Per-table override (storage parameters)
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_vacuum_threshold = 100
);
Updates planner statistics. Runs automatically via autovacuum but run manually after bulk loads.
ANALYZE orders; -- one table
ANALYZE; -- entire database
ANALYZE VERBOSE; -- with row counts
Pre-allocate space on each page for in-place HOT (Heap-Only Tuple) updates — reduces index updates.
-- 70% fill factor = reserve 30% for updates
CREATE TABLE orders (...) WITH (fillfactor = 70);
ALTER TABLE orders SET (fillfactor = 70);
More statistics = better estimates for skewed data:
-- Default is 100; increase for columns with high cardinality
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 500;
ANALYZE orders;
Tell the planner storage speed. For SSD:
random_page_cost = 1.1 # SSD (default 4.0 is for HDD)
effective_cache_size = 12GB # 75% of RAM — helps planner choose index scans
CREATE EXTENSION pg_hint_plan;
/*+ IndexScan(orders orders_customer_idx) */
SELECT * FROM orders WHERE customer_id = 42;
/*+ SeqScan(orders) */
SELECT * FROM orders WHERE status = 'pending';
CREATE EXTENSION pg_stat_statements;
-- Top queries by total time
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 20;
-- Worst average time
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 20;
-- Reset statistics
SELECT pg_stat_statements_reset();
EXPLAIN without ANALYZE then trusting cost estimates — costs are guessesVACUUM FULL in production — acquires exclusive lock, consider pg_repack insteadactual rows vs estimated rows mismatch — indicates stale statswork_mem at default 4MB for complex queries — causes disk spillsdata-ai
Use when adding AI-powered analytics to a SaaS platform — semantic search over business data, natural language queries, trend detection, anomaly alerts, and AI-generated insights for dashboards. Covers embeddings, NL2SQL, and per-tenant analytics...
data-ai
Design AI-powered analytics dashboards — what metrics to show, how to display AI predictions and confidence, drill-down patterns, KPI cards, trend visualisation, AI Insights panels, export design, and role-based dashboard variants. Invoke when...
development
Use when designing, building, reviewing, or upgrading production software systems that must be secure, performant, maintainable, scalable, and user-centered. Apply before writing specs, code, architecture, APIs, databases, mobile apps, SaaS platforms, or ERP systems.
development
Professional web app UI using commercial templates (Tabler/Bootstrap 5) with strong frontend design direction when needed. Use for CRUD interfaces, dashboards, admin panels with SweetAlert2, DataTables, Flatpickr. Clone seeder-page.php, use...