Query Optimization

Systematically identify and eliminate slow database queries using measurement-first principles.

The Optimization Process

1. Measure  — identify which queries are actually slow (pg_stat_statements)
2. Explain  — read the query plan (EXPLAIN ANALYZE BUFFERS)
3. Diagnose — identify the bottleneck (missing index, bad join, stale stats, etc.)
4. Fix      — targeted change (add index, rewrite query, update stats)
5. Verify   — measure again; confirm improvement

Never optimize what you haven't measured. Rewriting queries blindly wastes time and introduces bugs.

Finding Slow Queries

-- Requires pg_stat_statements extension (enable in postgresql.conf)
-- Most cloud providers have it pre-enabled

-- Top 20 slowest queries by average execution time
SELECT
  round(mean_exec_time::numeric, 2) AS avg_ms,
  round(total_exec_time::numeric, 0) AS total_ms,
  calls,
  round(stddev_exec_time::numeric, 2) AS stddev_ms,
  left(query, 120) AS query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Top 20 by total time (highest impact on the system)
SELECT
  round(total_exec_time::numeric, 0) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  left(query, 120) AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Reset stats after optimization to measure improvement
SELECT pg_stat_statements_reset();

Reading EXPLAIN ANALYZE Output

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total_cents, u.email
FROM orders o
JOIN users u ON u.id = o.customer_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 20;

What to Look For

Scan types:

Seq Scan       — Reading every row. Good for small tables or high-selectivity filters.
               Bad for large tables with low selectivity. → Add index.

Index Scan     — Using an index to find rows, then fetching from heap.
               Good. May still be slow if fetching many rows from random pages.

Index Only Scan — Reading value directly from index (no heap access).
               Best case. Happens when index INCLUDES all needed columns.

Bitmap Index Scan → Bitmap Heap Scan
               — Collects matching row locations into a bitmap, then fetches in order.
               Good when many rows match and they're scattered (avoids random I/O).

Key numbers:

actual time=start..end    — Actual time in ms for this node (start = first row, end = all rows)
rows=N (actual N)         — Planned rows vs actual rows. Large divergence = stale statistics.
                            Run ANALYZE table_name to update.
Buffers: shared hit=X read=Y
  hit  — Pages served from shared_buffers (RAM cache, fast)
  read — Pages read from disk (slow). Many reads → add index or increase shared_buffers.
rows removed by filter=N  — Rows read but discarded. High number = index is not selective enough.

Common Problem Patterns

| Plan element | What it means | Fix | |---|---|---| | Seq Scan on large table | No usable index | Add index on filtered/joined column | | actual rows ≫ estimated rows | Stale statistics | ANALYZE table_name | | loops=N on inner node | Nested loop with N outer rows | Add index on inner join column, or switch to Hash Join | | Hash Join (cost=high) | Large in-memory hash table | Increase work_mem, or add index | | Sort (cost=high) | Sorting large result set | Add index that includes ORDER BY column | | Rows Removed by Filter: N | Post-index filter discarding many rows | Add to index, or make index more specific |

Index Design for Queries

The Column Order Rule

-- Query: WHERE user_id = ? AND status = 'active' AND created_at > ?
-- Rule: equality columns first, then range column last

-- ✓ Optimal
CREATE INDEX idx_orders_user_status_date
  ON orders(user_id, status, created_at);

-- ✗ Suboptimal — range column in middle blocks use of status in index
CREATE INDEX idx_orders_user_date_status
  ON orders(user_id, created_at, status);

Covering Index (Eliminate Heap Fetches)

-- Query: SELECT id, status FROM orders WHERE customer_id = ? ORDER BY created_at DESC
-- Without covering index: Index Scan → fetch each row from heap

CREATE INDEX idx_orders_customer_covering
  ON orders(customer_id, created_at DESC)
  INCLUDE (status);
-- INCLUDE adds status to the index leaf pages
-- Result: Index Only Scan — no heap access needed

Partial Index (Small, Fast, Targeted)

-- Most queries only care about active/pending records
-- Index only those rows — much smaller, much faster

CREATE INDEX idx_orders_pending ON orders(customer_id, created_at)
  WHERE status = 'pending' AND deleted_at IS NULL;

-- Soft-delete pattern: index only non-deleted rows
CREATE INDEX idx_users_active_email ON users(email)
  WHERE deleted_at IS NULL;

N+1 Query Pattern

The most common query performance problem in ORMs. N+1 = 1 query to fetch a list, then N queries for related data.

Detecting N+1

-- In pg_stat_statements: many identical queries with different parameter values
-- e.g., "SELECT * FROM users WHERE id = $1" called 500 times
-- In application logs: many near-identical database calls in one request

Fix 1: Eager Loading (ORM)

// N+1: Prisma
const orders = await prisma.order.findMany({ where: { status: 'pending' } });
// Then for each order... (N separate queries)
for (const order of orders) {
  const customer = await prisma.user.findUnique({ where: { id: order.customerId } });
}

// Fix: include the relation
const orders = await prisma.order.findMany({
  where: { status: 'pending' },
  include: { customer: true },     // JOIN in a single query
});

Fix 2: Batch Query (Manual)

// Fetch all needed IDs, then one query
const customerIds = [...new Set(orders.map(o => o.customerId))];
const customers = await db.query(
  'SELECT * FROM users WHERE id = ANY($1)',
  [customerIds]
);
const customerMap = new Map(customers.rows.map(c => [c.id, c]));

// Now join in memory
const enriched = orders.map(o => ({ ...o, customer: customerMap.get(o.customerId) }));

Fix 3: JOIN

-- Replace N application queries with one SQL JOIN
SELECT 
  o.id, o.total_cents, o.status,
  u.id AS customer_id, u.email, u.name
FROM orders o
JOIN users u ON u.id = o.customer_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC;

Common Query Rewrites

Subquery → JOIN

-- Slow: correlated subquery runs once per row
SELECT id, (SELECT COUNT(*) FROM orders WHERE customer_id = u.id) AS order_count
FROM users u;

-- Fast: JOIN with aggregation (single pass)
SELECT u.id, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id
GROUP BY u.id;

COUNT(*) vs EXISTS

-- Slow: counts all matching rows to check if any exist
SELECT * FROM products WHERE (SELECT COUNT(*) FROM reviews WHERE product_id = products.id) > 0;

-- Fast: stops at first matching row
SELECT * FROM products WHERE EXISTS (
  SELECT 1 FROM reviews WHERE product_id = products.id
);

Deep Offset Pagination → Cursor

-- Slow: OFFSET 10000 must scan and discard 10,000 rows
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 10000;

-- Fast: cursor-based (uses index seek)
SELECT * FROM orders
WHERE created_at < '2024-01-15T14:30:00Z'   -- cursor = last item's timestamp
ORDER BY created_at DESC
LIMIT 20;

DISTINCT → GROUP BY or EXISTS

-- Often slow: DISTINCT sorts the entire result set
SELECT DISTINCT customer_id FROM orders WHERE status = 'completed';

-- Faster equivalent using GROUP BY
SELECT customer_id FROM orders WHERE status = 'completed' GROUP BY customer_id;

-- Fastest if you're just checking existence
SELECT id FROM users u WHERE EXISTS (
  SELECT 1 FROM orders WHERE customer_id = u.id AND status = 'completed'
);

IN with subquery → JOIN

-- Slow on large sets: IN with subquery re-evaluates subquery per row (in some planners)
SELECT * FROM products WHERE id IN (
  SELECT product_id FROM order_items WHERE quantity > 100
);

-- More predictable performance
SELECT DISTINCT p.*
FROM products p
JOIN order_items oi ON oi.product_id = p.id
WHERE oi.quantity > 100;

Statistics and Table Health

-- Check for stale statistics (tables not analyzed recently)
SELECT
  schemaname,
  tablename,
  last_autovacuum,
  last_autoanalyze,
  n_live_tup,
  n_dead_tup,
  round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1) AS bloat_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Manually analyze a table (updates planner statistics)
ANALYZE VERBOSE orders;

-- Force vacuum to reclaim space from dead tuples
VACUUM ANALYZE orders;

Bloat Thresholds

bloat_pct > 20% → Run VACUUM ANALYZE
bloat_pct > 50% → Run VACUUM FULL (locks table — schedule maintenance window)
n_dead_tup consistently high → autovacuum settings may be too conservative

Index Usage Audit

-- Find unused indexes (remove them — they slow down writes for no read benefit)
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan < 50          -- Very few scans (consider threshold based on table writes)
  AND NOT indisprimary        -- Not the PK
  AND NOT indisunique         -- Not a unique constraint
ORDER BY pg_relation_size(indexrelid) DESC;

-- Find duplicate/redundant indexes
-- An index on (a, b) makes a standalone index on (a) redundant
SELECT
  idx1.indexname AS redundant_index,
  idx2.indexname AS covered_by
FROM pg_indexes idx1
JOIN pg_indexes idx2 ON (
  idx1.tablename = idx2.tablename
  AND idx1.indexname <> idx2.indexname
  AND idx2.indexdef LIKE idx1.indexdef || '%'   -- idx1 is a prefix of idx2
)
WHERE idx1.schemaname = 'public';

-- Index sizes (identify expensive indexes)
SELECT
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

work_mem Tuning

work_mem controls per-operation memory for sorts and hash tables.

-- Check if sorts/hashes are spilling to disk
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id;
-- Look for: "Batches: N" where N > 1 (hash spilled to disk)
-- Look for: "Sort Method: external merge" or "external sort" (sort spilled to disk)

-- Increase for heavy analytical queries (set per-session, not globally)
SET work_mem = '256MB';  -- Only for this connection/session

-- Global setting (conservative — multiply by max_connections × max_parallelism)
-- In postgresql.conf:
-- work_mem = 64MB   # With 200 connections: up to 200 × 64MB = 12.8GB peak

Query Optimization Reference

| Problem | Detected by | Fix | |---------|-------------|-----| | Missing index on filter | Seq Scan on large table | Add index on WHERE/JOIN column | | Missing index on sort | Sort node, no index scan | Add index including ORDER BY column | | Stale statistics | rows estimate ≫ actual rows | ANALYZE table_name | | N+1 queries | Many near-identical queries in logs | Eager load, batch, or JOIN | | Sort spills to disk | external sort in EXPLAIN | Increase work_mem for session | | Hash join spills | Batches: N > 1 | Increase work_mem, or add index to avoid hash | | Deep offset pagination | Slow OFFSET N | Switch to cursor pagination | | High table bloat | n_dead_tup high | VACUUM ANALYZE | | Unused indexes | idx_scan < 50 | Drop the index |

Deeper Reference

For EXPLAIN ANALYZE interpretation guides and index pattern recipes, see:

references/explain-analyze-guide.md — how to read every node type in EXPLAIN ANALYZE output, cost estimation, and a worked example of turning a 45-second query into sub-second
references/index-patterns.md — B-tree, GIN, GiST, BRIN, and partial index recipes; covering index patterns; index bloat management

Query Optimization

Systematically identify and eliminate slow database queries using measurement-first principles.

The Optimization Process

1. Measure  — identify which queries are actually slow (pg_stat_statements)
2. Explain  — read the query plan (EXPLAIN ANALYZE BUFFERS)
3. Diagnose — identify the bottleneck (missing index, bad join, stale stats, etc.)
4. Fix      — targeted change (add index, rewrite query, update stats)
5. Verify   — measure again; confirm improvement

Never optimize what you haven't measured. Rewriting queries blindly wastes time and introduces bugs.

Finding Slow Queries

-- Requires pg_stat_statements extension (enable in postgresql.conf)
-- Most cloud providers have it pre-enabled

-- Top 20 slowest queries by average execution time
SELECT
  round(mean_exec_time::numeric, 2) AS avg_ms,
  round(total_exec_time::numeric, 0) AS total_ms,
  calls,
  round(stddev_exec_time::numeric, 2) AS stddev_ms,
  left(query, 120) AS query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Top 20 by total time (highest impact on the system)
SELECT
  round(total_exec_time::numeric, 0) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  left(query, 120) AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Reset stats after optimization to measure improvement
SELECT pg_stat_statements_reset();

Reading EXPLAIN ANALYZE Output

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total_cents, u.email
FROM orders o
JOIN users u ON u.id = o.customer_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC
LIMIT 20;

What to Look For

Scan types:

Seq Scan       — Reading every row. Good for small tables or high-selectivity filters.
               Bad for large tables with low selectivity. → Add index.

Index Scan     — Using an index to find rows, then fetching from heap.
               Good. May still be slow if fetching many rows from random pages.

Index Only Scan — Reading value directly from index (no heap access).
               Best case. Happens when index INCLUDES all needed columns.

Bitmap Index Scan → Bitmap Heap Scan
               — Collects matching row locations into a bitmap, then fetches in order.
               Good when many rows match and they're scattered (avoids random I/O).

Key numbers:

actual time=start..end    — Actual time in ms for this node (start = first row, end = all rows)
rows=N (actual N)         — Planned rows vs actual rows. Large divergence = stale statistics.
                            Run ANALYZE table_name to update.
Buffers: shared hit=X read=Y
  hit  — Pages served from shared_buffers (RAM cache, fast)
  read — Pages read from disk (slow). Many reads → add index or increase shared_buffers.
rows removed by filter=N  — Rows read but discarded. High number = index is not selective enough.

Common Problem Patterns

Index Design for Queries

The Column Order Rule

-- Query: WHERE user_id = ? AND status = 'active' AND created_at > ?
-- Rule: equality columns first, then range column last

-- ✓ Optimal
CREATE INDEX idx_orders_user_status_date
  ON orders(user_id, status, created_at);

-- ✗ Suboptimal — range column in middle blocks use of status in index
CREATE INDEX idx_orders_user_date_status
  ON orders(user_id, created_at, status);

Covering Index (Eliminate Heap Fetches)

-- Query: SELECT id, status FROM orders WHERE customer_id = ? ORDER BY created_at DESC
-- Without covering index: Index Scan → fetch each row from heap

CREATE INDEX idx_orders_customer_covering
  ON orders(customer_id, created_at DESC)
  INCLUDE (status);
-- INCLUDE adds status to the index leaf pages
-- Result: Index Only Scan — no heap access needed

Partial Index (Small, Fast, Targeted)

-- Most queries only care about active/pending records
-- Index only those rows — much smaller, much faster

CREATE INDEX idx_orders_pending ON orders(customer_id, created_at)
  WHERE status = 'pending' AND deleted_at IS NULL;

-- Soft-delete pattern: index only non-deleted rows
CREATE INDEX idx_users_active_email ON users(email)
  WHERE deleted_at IS NULL;

N+1 Query Pattern

The most common query performance problem in ORMs. N+1 = 1 query to fetch a list, then N queries for related data.

Detecting N+1

-- In pg_stat_statements: many identical queries with different parameter values
-- e.g., "SELECT * FROM users WHERE id = $1" called 500 times
-- In application logs: many near-identical database calls in one request

Fix 1: Eager Loading (ORM)

// N+1: Prisma
const orders = await prisma.order.findMany({ where: { status: 'pending' } });
// Then for each order... (N separate queries)
for (const order of orders) {
  const customer = await prisma.user.findUnique({ where: { id: order.customerId } });
}

// Fix: include the relation
const orders = await prisma.order.findMany({
  where: { status: 'pending' },
  include: { customer: true },     // JOIN in a single query
});

Fix 2: Batch Query (Manual)

// Fetch all needed IDs, then one query
const customerIds = [...new Set(orders.map(o => o.customerId))];
const customers = await db.query(
  'SELECT * FROM users WHERE id = ANY($1)',
  [customerIds]
);
const customerMap = new Map(customers.rows.map(c => [c.id, c]));

// Now join in memory
const enriched = orders.map(o => ({ ...o, customer: customerMap.get(o.customerId) }));

Fix 3: JOIN

-- Replace N application queries with one SQL JOIN
SELECT 
  o.id, o.total_cents, o.status,
  u.id AS customer_id, u.email, u.name
FROM orders o
JOIN users u ON u.id = o.customer_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC;

Common Query Rewrites

Subquery → JOIN

-- Slow: correlated subquery runs once per row
SELECT id, (SELECT COUNT(*) FROM orders WHERE customer_id = u.id) AS order_count
FROM users u;

-- Fast: JOIN with aggregation (single pass)
SELECT u.id, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id
GROUP BY u.id;

COUNT(*) vs EXISTS

-- Slow: counts all matching rows to check if any exist
SELECT * FROM products WHERE (SELECT COUNT(*) FROM reviews WHERE product_id = products.id) > 0;

-- Fast: stops at first matching row
SELECT * FROM products WHERE EXISTS (
  SELECT 1 FROM reviews WHERE product_id = products.id
);

Deep Offset Pagination → Cursor

-- Slow: OFFSET 10000 must scan and discard 10,000 rows
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 10000;

-- Fast: cursor-based (uses index seek)
SELECT * FROM orders
WHERE created_at < '2024-01-15T14:30:00Z'   -- cursor = last item's timestamp
ORDER BY created_at DESC
LIMIT 20;

DISTINCT → GROUP BY or EXISTS

-- Often slow: DISTINCT sorts the entire result set
SELECT DISTINCT customer_id FROM orders WHERE status = 'completed';

-- Faster equivalent using GROUP BY
SELECT customer_id FROM orders WHERE status = 'completed' GROUP BY customer_id;

-- Fastest if you're just checking existence
SELECT id FROM users u WHERE EXISTS (
  SELECT 1 FROM orders WHERE customer_id = u.id AND status = 'completed'
);

IN with subquery → JOIN

-- Slow on large sets: IN with subquery re-evaluates subquery per row (in some planners)
SELECT * FROM products WHERE id IN (
  SELECT product_id FROM order_items WHERE quantity > 100
);

-- More predictable performance
SELECT DISTINCT p.*
FROM products p
JOIN order_items oi ON oi.product_id = p.id
WHERE oi.quantity > 100;

Statistics and Table Health

-- Check for stale statistics (tables not analyzed recently)
SELECT
  schemaname,
  tablename,
  last_autovacuum,
  last_autoanalyze,
  n_live_tup,
  n_dead_tup,
  round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 1) AS bloat_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Manually analyze a table (updates planner statistics)
ANALYZE VERBOSE orders;

-- Force vacuum to reclaim space from dead tuples
VACUUM ANALYZE orders;

Bloat Thresholds

bloat_pct > 20% → Run VACUUM ANALYZE
bloat_pct > 50% → Run VACUUM FULL (locks table — schedule maintenance window)
n_dead_tup consistently high → autovacuum settings may be too conservative

Index Usage Audit

-- Find unused indexes (remove them — they slow down writes for no read benefit)
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan < 50          -- Very few scans (consider threshold based on table writes)
  AND NOT indisprimary        -- Not the PK
  AND NOT indisunique         -- Not a unique constraint
ORDER BY pg_relation_size(indexrelid) DESC;

-- Find duplicate/redundant indexes
-- An index on (a, b) makes a standalone index on (a) redundant
SELECT
  idx1.indexname AS redundant_index,
  idx2.indexname AS covered_by
FROM pg_indexes idx1
JOIN pg_indexes idx2 ON (
  idx1.tablename = idx2.tablename
  AND idx1.indexname <> idx2.indexname
  AND idx2.indexdef LIKE idx1.indexdef || '%'   -- idx1 is a prefix of idx2
)
WHERE idx1.schemaname = 'public';

-- Index sizes (identify expensive indexes)
SELECT
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

work_mem Tuning

work_mem controls per-operation memory for sorts and hash tables.

-- Check if sorts/hashes are spilling to disk
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id;
-- Look for: "Batches: N" where N > 1 (hash spilled to disk)
-- Look for: "Sort Method: external merge" or "external sort" (sort spilled to disk)

-- Increase for heavy analytical queries (set per-session, not globally)
SET work_mem = '256MB';  -- Only for this connection/session

-- Global setting (conservative — multiply by max_connections × max_parallelism)
-- In postgresql.conf:
-- work_mem = 64MB   # With 200 connections: up to 200 × 64MB = 12.8GB peak

Query Optimization Reference

Deeper Reference

For EXPLAIN ANALYZE interpretation guides and index pattern recipes, see:

references/explain-analyze-guide.md — how to read every node type in EXPLAIN ANALYZE output, cost estimation, and a worked example of turning a 45-second query into sub-second
references/index-patterns.md — B-tree, GIN, GiST, BRIN, and partial index recipes; covering index patterns; index bloat management

Adoption

harsh040506/query-optimization

$ install --global

Security Scan Results

SKILL.md

Query Optimization

The Optimization Process

Finding Slow Queries

Reading EXPLAIN ANALYZE Output

What to Look For

Common Problem Patterns

Index Design for Queries

The Column Order Rule

Covering Index (Eliminate Heap Fetches)

Partial Index (Small, Fast, Targeted)

N+1 Query Pattern

Detecting N+1

Fix 1: Eager Loading (ORM)

Fix 2: Batch Query (Manual)

Fix 3: JOIN

Common Query Rewrites

Subquery → JOIN

COUNT(*) vs EXISTS

Deep Offset Pagination → Cursor

DISTINCT → GROUP BY or EXISTS

IN with subquery → JOIN

Statistics and Table Health

Bloat Thresholds

Index Usage Audit

work_mem Tuning

Query Optimization Reference

Deeper Reference

Related Skills

harsh040506/single-cell-rna-qc

harsh040506/scvi-tools

harsh040506/scientific-problem-selection

harsh040506/nextflow-development

harsh040506/query-optimization

$ install --global

Security Scan Results

SKILL.md

Query Optimization

The Optimization Process

Finding Slow Queries

Reading EXPLAIN ANALYZE Output

What to Look For

Common Problem Patterns

Index Design for Queries

The Column Order Rule

Covering Index (Eliminate Heap Fetches)

Partial Index (Small, Fast, Targeted)

N+1 Query Pattern

Detecting N+1

Fix 1: Eager Loading (ORM)

Fix 2: Batch Query (Manual)

Fix 3: JOIN

Common Query Rewrites

Subquery → JOIN

COUNT(*) vs EXISTS

Deep Offset Pagination → Cursor

DISTINCT → GROUP BY or EXISTS

IN with subquery → JOIN

Statistics and Table Health

Bloat Thresholds

Index Usage Audit

work_mem Tuning

Query Optimization Reference

Deeper Reference

Related Skills

harsh040506/single-cell-rna-qc

harsh040506/scvi-tools

harsh040506/scientific-problem-selection

harsh040506/nextflow-development