Database Query Optimization Techniques

Your application is slow. Users are complaining. You check the metrics and discover the painful truth: 80% of your response time is waiting for the database.

I’ve been there. Multiple times. And I’ve learned that database optimization isn’t about memorizing SQL tricks - it’s about understanding how databases think, what they’re doing behind the scenes, and how to work with them instead of against them.

Today I’m sharing the database optimization techniques that have saved my projects from performance disasters.

Understanding Query Execution

Before we optimize, we need to understand what’s actually happening when you run a query.

The Query Execution Pipeline

Your SQL Query
    ↓
Parser (syntax check)
    ↓
Planner/Optimizer (creates execution plan)
    ↓
Executor (runs the plan)
    ↓
Result

The key insight: The database doesn’t just “run” your query. It analyzes it, considers multiple strategies, estimates costs, and chooses what it thinks is the best approach.

Sometimes it’s wrong. Our job is to help it make better decisions.

Reading EXPLAIN Plans

This is the single most important skill for database optimization.

PostgreSQL Example:

EXPLAIN ANALYZE
SELECT users.name, COUNT(posts.id) as post_count
FROM users
LEFT JOIN posts ON users.id = posts.user_id
WHERE users.created_at > '2025-01-01'
GROUP BY users.id, users.name
ORDER BY post_count DESC
LIMIT 10;

Output:

Limit  (cost=52341.24..52341.26 rows=10 width=24) (actual time=423.156..423.159 rows=10 loops=1)
  ->  Sort  (cost=52341.24..52466.37 rows=50050 width=24) (actual time=423.154..423.156 rows=10 loops=1)
        Sort Key: (count(posts.id)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        ->  HashAggregate  (cost=50217.37..50717.87 rows=50050 width=24) (actual time=418.234..421.445 rows=48234 loops=1)
              Group Key: users.id
              ->  Hash Left Join  (cost=12453.23..45217.12 rows=500025 width=16) (actual time=85.234..312.456 rows=500025 loops=1)
                    Hash Cond: (posts.user_id = users.id)
                    ->  Seq Scan on posts  (cost=0.00..18234.25 rows=500025 width=8) (actual time=0.034..98.234 rows=500025 loops=1)
                    ->  Hash  (cost=11234.50..11234.50 rows=50050 width=16) (actual time=85.123..85.123 rows=48234 loops=1)
                          Buckets: 65536  Batches: 1  Memory Usage: 2891kB
                          ->  Seq Scan on users  (cost=0.00..11234.50 rows=50050 width=16) (actual time=0.023..78.234 rows=48234 loops=1)
                                Filter: (created_at > '2025-01-01'::date)
                                Rows Removed by Filter: 12234
Planning Time: 2.345 ms
Execution Time: 423.234 ms

What to look for:

Sequential Scans - Reading entire table (slow for large tables)
Actual time vs. Estimated - If wildly different, statistics are outdated
High cost operations - Where most time is spent
Rows removed by filter - Wasted work

Indexing Strategies

Indexes are your first and most powerful optimization tool. But they’re not magic - you need to use them correctly.

When to Add an Index

Add indexes for:

Columns in WHERE clauses
Columns in JOIN conditions
Columns in ORDER BY
Columns in GROUP BY
Columns with high cardinality (many unique values)

Don’t index:

Small tables (< 1000 rows)
Columns with low cardinality (few unique values)
Columns that change frequently in write-heavy tables
Columns you never query

Index Types

B-Tree Index (Default)

Best for equality and range queries.

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Multi-column index (column order matters!)
CREATE INDEX idx_posts_user_date ON posts(user_id, created_at);

-- This index can be used for:
-- WHERE user_id = 123
-- WHERE user_id = 123 AND created_at > '2025-01-01'
-- WHERE user_id = 123 ORDER BY created_at

-- But NOT efficiently for:
-- WHERE created_at > '2025-01-01' (doesn't use first column)

Partial Index

Index only a subset of rows.

-- Only index active users
CREATE INDEX idx_active_users_email ON users(email) WHERE active = true;

-- Only index recent posts
CREATE INDEX idx_recent_posts ON posts(created_at) WHERE created_at > NOW() - INTERVAL '30 days';

-- Much smaller index, faster updates, perfect for common queries

Covering Index (INCLUDE)

Include additional columns in the index to avoid table lookups.

-- Index on email, but include name and created_at
CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (name, created_at);

-- This query can be satisfied entirely from the index (no table lookup):
SELECT email, name, created_at FROM users WHERE email = 'user@example.com';

Expression Index

Index computed values.

-- Index lowercase email for case-insensitive searches
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Now this is fast:
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Index JSONB field
CREATE INDEX idx_settings_timezone ON users((settings->>'timezone'));

Full-Text Search Index

For text search operations.

-- Add tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || content)) STORED;

-- Create GIN index
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);

-- Fast full-text search
SELECT * FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'database optimization');

GiST/GIN Index

For complex data types (arrays, JSON, geometry).

-- Index for array containment queries
CREATE INDEX idx_tags_gin ON posts USING GIN(tags);

-- Fast array queries:
SELECT * FROM posts WHERE tags @> ARRAY['postgresql', 'performance'];

-- Index for JSONB
CREATE INDEX idx_metadata_gin ON events USING GIN(metadata);

-- Fast JSON queries:
SELECT * FROM events WHERE metadata @> '{"status": "completed"}';

Index Maintenance

Indexes aren’t set-it-and-forget-it.

Monitor Index Usage:

-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Drop unused indexes:
-- DROP INDEX idx_unused_index;

Monitor Index Health:

-- Check for index bloat
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild bloated indexes:
-- REINDEX INDEX CONCURRENTLY idx_bloated_index;

Query Optimization Patterns

The N+1 Query Problem

The most common performance killer.

Bad (N+1 queries):

// Get all users
const users = await db.query('SELECT * FROM users LIMIT 100');

// For each user, get their posts (100 queries!)
for (let user of users) {
  user.posts = await db.query('SELECT * FROM posts WHERE user_id = ?', user.id);
}

Good (2 queries):

// Get all users
const users = await db.query('SELECT * FROM users LIMIT 100');
const userIds = users.map(u => u.id);

// Get all posts in one query
const posts = await db.query(
  'SELECT * FROM posts WHERE user_id = ANY(?)',
  [userIds]
);

// Group posts by user
const postsByUser = posts.reduce((acc, post) => {
  if (!acc[post.user_id]) acc[post.user_id] = [];
  acc[post.user_id].push(post);
  return acc;
}, {});

// Attach posts to users
users.forEach(user => {
  user.posts = postsByUser[user.id] || [];
});

Best (1 query with JOIN):

SELECT
  users.*,
  json_agg(
    json_build_object(
      'id', posts.id,
      'title', posts.title,
      'created_at', posts.created_at
    )
  ) as posts
FROM users
LEFT JOIN posts ON users.id = posts.user_id
WHERE users.active = true
GROUP BY users.id
LIMIT 100;

Pagination Done Right

Wrong approach (OFFSET):

-- Gets slower as you paginate deeper
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 50 OFFSET 10000;  -- Has to scan 10,050 rows to skip first 10,000

Right approach (Cursor-based):

-- First page
SELECT * FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 50;

-- Next page (using last row's values)
SELECT * FROM posts
WHERE (created_at, id) < ('2025-10-20 15:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 50;

-- Always scans only 50 rows, regardless of page depth

Implementation:

async function getPosts(cursor = null, limit = 50) {
  let query = 'SELECT * FROM posts';
  let params = [];

  if (cursor) {
    // cursor = base64encode(JSON.stringify({created_at, id}))
    const decoded = JSON.parse(Buffer.from(cursor, 'base64').toString());
    query += ' WHERE (created_at, id) < ($1, $2)';
    params = [decoded.created_at, decoded.id];
  }

  query += ' ORDER BY created_at DESC, id DESC LIMIT $' + (params.length + 1);
  params.push(limit);

  const posts = await db.query(query, params);

  // Generate next cursor
  const nextCursor = posts.length === limit
    ? Buffer.from(JSON.stringify({
        created_at: posts[posts.length - 1].created_at,
        id: posts[posts.length - 1].id
      })).toString('base64')
    : null;

  return { posts, nextCursor };
}

Counting Efficiently

Slow (accurate but slow):

-- Scans entire table
SELECT COUNT(*) FROM posts WHERE category = 'tech';

Fast (approximate):

-- Use table statistics (fast but approximate)
SELECT reltuples::BIGINT AS estimate
FROM pg_class
WHERE relname = 'posts';

-- For filtered counts, use EXPLAIN:
EXPLAIN SELECT COUNT(*) FROM posts WHERE category = 'tech';
-- Extract row estimate from output

Better (cached count):

-- Maintain counts in a separate table
CREATE TABLE category_stats (
  category VARCHAR(50) PRIMARY KEY,
  post_count INTEGER DEFAULT 0,
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Update via trigger or periodic job
CREATE OR REPLACE FUNCTION update_category_count()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    INSERT INTO category_stats (category, post_count)
    VALUES (NEW.category, 1)
    ON CONFLICT (category)
    DO UPDATE SET post_count = category_stats.post_count + 1;
  ELSIF TG_OP = 'DELETE' THEN
    UPDATE category_stats
    SET post_count = post_count - 1
    WHERE category = OLD.category;
  END IF;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER posts_category_count
AFTER INSERT OR DELETE ON posts
FOR EACH ROW EXECUTE FUNCTION update_category_count();

Batching Operations

Bad (N queries):

for (let user of users) {
  await db.query(
    'INSERT INTO audit_log (user_id, action) VALUES (?, ?)',
    [user.id, 'login']
  );
}

Good (1 query):

const values = users.map(u => `(${u.id}, 'login')`).join(',');
await db.query(`
  INSERT INTO audit_log (user_id, action)
  VALUES ${values}
`);

Better (using parameterized batch insert):

// PostgreSQL
const values = users.map((u, i) => `($${i*2+1}, $${i*2+2})`).join(',');
const params = users.flatMap(u => [u.id, 'login']);

await db.query(
  `INSERT INTO audit_log (user_id, action) VALUES ${values}`,
  params
);

Advanced Optimization Techniques

Query Materialization

Pre-compute expensive queries.

-- Create materialized view
CREATE MATERIALIZED VIEW user_post_stats AS
SELECT
  users.id,
  users.name,
  COUNT(posts.id) as post_count,
  MAX(posts.created_at) as last_post_at,
  AVG(posts.view_count) as avg_views
FROM users
LEFT JOIN posts ON users.id = posts.user_id
GROUP BY users.id, users.name;

-- Create index on materialized view
CREATE INDEX idx_user_post_stats_count ON user_post_stats(post_count DESC);

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY user_post_stats;

-- Now queries are instant:
SELECT * FROM user_post_stats ORDER BY post_count DESC LIMIT 10;

Automatic Refresh:

-- Create refresh function
CREATE OR REPLACE FUNCTION refresh_user_stats()
RETURNS void AS $$
BEGIN
  REFRESH MATERIALIZED VIEW CONCURRENTLY user_post_stats;
END;
$$ LANGUAGE plpgsql;

-- Schedule with pg_cron
SELECT cron.schedule('refresh-user-stats', '*/15 * * * *', 'SELECT refresh_user_stats()');

Partial Data Loading

Don’t load what you don’t need.

Bad:

-- Loading huge JSONB field
SELECT * FROM documents WHERE id = 123;

Good:

-- Load only needed fields
SELECT id, title, metadata->>'author' as author FROM documents WHERE id = 123;

-- Load JSONB field separately if needed
SELECT content FROM documents WHERE id = 123;

Connection Pooling

Database connections are expensive to create.

const { Pool } = require('pg');

const pool = new Pool({
  host: 'localhost',
  database: 'mydb',
  user: 'user',
  password: 'password',
  max: 20,                    // Maximum pool size
  idleTimeoutMillis: 30000,   // Close idle connections after 30s
  connectionTimeoutMillis: 2000, // Fail fast if no connection available
});

// Usage
async function getUser(id) {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [id]);
    return result.rows[0];
  } finally {
    client.release();
  }
}

Read Replicas

Distribute read load across multiple databases.

const masterPool = new Pool({ /* master config */ });
const replicaPool = new Pool({ /* replica config */ });

async function executeQuery(query, params, options = {}) {
  const pool = options.writeable ? masterPool : replicaPool;
  return pool.query(query, params);
}

// Reads from replica
const users = await executeQuery('SELECT * FROM users WHERE active = true', []);

// Writes to master
await executeQuery(
  'INSERT INTO users (name, email) VALUES ($1, $2)',
  ['John', 'john@example.com'],
  { writeable: true }
);

Real-World Optimization Case Study

Let me walk through a recent optimization I did for a reporting dashboard.

The Problem

Dashboard query taking 45 seconds:

SELECT
  DATE(created_at) as date,
  COUNT(DISTINCT user_id) as unique_users,
  COUNT(*) as total_events,
  COUNT(*) FILTER (WHERE event_type = 'purchase') as purchases,
  SUM(amount) FILTER (WHERE event_type = 'purchase') as revenue
FROM events
WHERE created_at BETWEEN '2024-01-01' AND '2025-12-31'
GROUP BY DATE(created_at)
ORDER BY date DESC;

Table size: 500M rows Time: 45 seconds Users: Angry

Step 1: Analyze the Query

EXPLAIN ANALYZE [query above]

-- Key findings:
-- Sequential scan on 500M rows
-- No index on created_at
-- Computing DATE() on every row
-- Multiple COUNT DISTINCT operations

Step 2: Add Basic Index

CREATE INDEX idx_events_created ON events(created_at);
-- Time: 45s → 32s (28% improvement)
-- Still not good enough

Step 3: Add Computed Date Column

-- Add date column
ALTER TABLE events ADD COLUMN event_date DATE;

-- Populate it
UPDATE events SET event_date = DATE(created_at);

-- Keep it updated
CREATE OR REPLACE FUNCTION set_event_date()
RETURNS TRIGGER AS $$
BEGIN
  NEW.event_date := DATE(NEW.created_at);
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER events_set_date
BEFORE INSERT OR UPDATE ON events
FOR EACH ROW EXECUTE FUNCTION set_event_date();

-- Index the date column
CREATE INDEX idx_events_date ON events(event_date);

-- Time: 32s → 18s (44% improvement from baseline)
-- Better, but still slow

Step 4: Create Materialized View

CREATE MATERIALIZED VIEW daily_event_stats AS
SELECT
  event_date as date,
  COUNT(DISTINCT user_id) as unique_users,
  COUNT(*) as total_events,
  COUNT(*) FILTER (WHERE event_type = 'purchase') as purchases,
  SUM(amount) FILTER (WHERE event_type = 'purchase') as revenue
FROM events
GROUP BY event_date;

CREATE INDEX idx_daily_stats_date ON daily_event_stats(date DESC);

-- Query the materialized view
SELECT * FROM daily_event_stats
WHERE date BETWEEN '2024-01-01' AND '2025-12-31'
ORDER BY date DESC;

-- Time: 18s → 0.05s (99.9% improvement!)

Step 5: Incremental Updates

Instead of full refresh, update incrementally:

-- Function to update yesterday's stats
CREATE OR REPLACE FUNCTION refresh_daily_stats()
RETURNS void AS $$
BEGIN
  -- Delete yesterday's stats
  DELETE FROM daily_event_stats
  WHERE date >= CURRENT_DATE - INTERVAL '1 day';

  -- Recompute yesterday's stats
  INSERT INTO daily_event_stats
  SELECT
    event_date as date,
    COUNT(DISTINCT user_id) as unique_users,
    COUNT(*) as total_events,
    COUNT(*) FILTER (WHERE event_type = 'purchase') as purchases,
    SUM(amount) FILTER (WHERE event_type = 'purchase') as revenue
  FROM events
  WHERE event_date >= CURRENT_DATE - INTERVAL '1 day'
  GROUP BY event_date;
END;
$$ LANGUAGE plpgsql;

-- Run every hour
SELECT cron.schedule('refresh-stats', '0 * * * *', 'SELECT refresh_daily_stats()');

The Results

Performance:

Query time: 45s → 0.05s (99.9% improvement)
Dashboard load time: 50s → 0.2s
Database CPU: 85% → 5%

Impact:

Users actually use the dashboard now
Can add more reports without performance concerns
Database handles 10x more concurrent users

Trade-offs:

Data delayed by up to 1 hour (acceptable for this use case)
Additional storage for materialized view (minimal cost)
Slightly more complex maintenance

Database-Specific Tips

PostgreSQL

Use pg_stat_statements:

-- Enable extension
CREATE EXTENSION pg_stat_statements;

-- Find slowest queries
SELECT
  round(total_exec_time::numeric, 2) AS total_time,
  calls,
  round(mean_exec_time::numeric, 2) AS mean,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Vacuum regularly:

-- Check table bloat
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
  n_dead_tup,
  n_live_tup,
  round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

-- Manual vacuum
VACUUM ANALYZE table_name;

-- Configure autovacuum for high-traffic tables
ALTER TABLE high_traffic_table SET (autovacuum_vacuum_scale_factor = 0.05);

MySQL

Use Query Cache (MySQL 5.7):

-- Enable query cache
SET GLOBAL query_cache_size = 67108864;  -- 64MB
SET GLOBAL query_cache_type = ON;

-- Check cache stats
SHOW STATUS LIKE 'Qcache%';

Optimize Table:

-- Rebuild table and indexes
OPTIMIZE TABLE table_name;

Monitoring and Maintenance

Set up proper monitoring to catch issues early.

Key Metrics to Monitor

-- Active connections
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';

-- Long-running queries
SELECT
  pid,
  now() - query_start as duration,
  state,
  query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < now() - interval '5 minutes'
ORDER BY duration DESC;

-- Database size
SELECT
  pg_database.datname,
  pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;

-- Table sizes
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
  pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS indexes_size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 20;

Automated Alerts

// Example monitoring script
async function checkDatabaseHealth() {
  // Check for long-running queries
  const longQueries = await db.query(`
    SELECT count(*) as count
    FROM pg_stat_activity
    WHERE state = 'active'
      AND query_start < now() - interval '1 minute'
  `);

  if (longQueries.rows[0].count > 10) {
    await alerting.send('Too many long-running queries');
  }

  // Check connection pool
  const connections = await db.query(`
    SELECT count(*) as count FROM pg_stat_activity
  `);

  if (connections.rows[0].count > 90) { // max is 100
    await alerting.send('Connection pool nearly exhausted');
  }

  // Check for table bloat
  const bloat = await db.query(`
    SELECT count(*) as count
    FROM pg_stat_user_tables
    WHERE n_dead_tup > n_live_tup * 0.2
  `);

  if (bloat.rows[0].count > 0) {
    await alerting.send('Tables need vacuuming');
  }
}

// Run every 5 minutes
setInterval(checkDatabaseHealth, 5 * 60 * 1000);

Common Mistakes to Avoid

Using SELECT *

Bad:

SELECT * FROM users WHERE id = 123;
-- Loads all columns, including potentially large BLOBs

Good:

SELECT id, name, email FROM users WHERE id = 123;
-- Loads only needed data

Premature Optimization

Don’t add indexes before you have performance problems. Every index:

Slows down writes
Takes up disk space
Needs maintenance

Start simple, measure, then optimize.

Ignoring EXPLAIN

Never deploy a query without understanding its execution plan.

Over-Indexing

I’ve seen tables with 20+ indexes. That’s usually a code smell.

If you have that many indexes, consider:

Are you querying in too many different ways?
Should you use a different data model?
Do you need a search engine (Elasticsearch) instead?

Your Database Optimization Checklist

For Every Slow Query:

For Production Databases:

Monitor slow query log
Set up index usage monitoring
Configure autovacuum appropriately
Use connection pooling
Implement query timeout
Set up alerts for long queries
Regular backup and recovery testing

The Bottom Line

Database optimization is a continuous process, not a one-time fix. The key principles:

Measure first - Use EXPLAIN, don’t guess
Index strategically - Not too few, not too many
Rewrite queries - Sometimes the problem is the approach
Cache when appropriate - Materialized views, application caching
Monitor continuously - Catch issues before users complain

Master these techniques and you’ll never fear database performance problems again.

Part of the Developer Skills series focusing on technical excellence and performance optimization.

What’s your worst database performance story? Have you discovered optimization techniques that saved your project? Let me know - I learn from every developer’s experience!

AI & ML

Building an AI Blog Writer: From Topic to Published Post with n8n, Claude, and GitHub

Developer skills

Cutting Cortex LLM Costs by 90%: The Prompt Engineering Playbook

Engineering

Infrastructure as a Fabric: How a Qdrant MCP Server Led Me to Rethink Everything

Enterprise software

Zero-Downtime Database Migrations

News & insights

From Idea to Production in 28 Days

Open Source

Personal AI Operations Memory: Building a Learning System for Git-Ops

Security

Zero-Trust Networking Patterns for Kubernetes Clusters