Optimize before you scale
The most common database scaling mistake we see: teams reach for infrastructure solutions before they have exhausted optimization. They upgrade to the biggest RDS instance available, or start researching distributed databases, when their actual problem is a missing index and 47 N+1 queries.
Before you spend a dollar on scaling infrastructure, exhaust these optimizations first. In our experience, this covers 90% of the "my database is slow" complaints we hear from growing businesses.
Find your slow queries first
You cannot fix what you cannot see. pg_stat_statements is the single most valuable PostgreSQL extension for performance work. It tracks execution statistics for every query your application runs.
Enable it:
-- Add to postgresql.conf or RDS parameter group
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
-- Then create the extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Now find your worst offenders:
-- Top 10 queries by total execution time
SELECT
substring(query, 1, 80) AS short_query,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS avg_ms,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
This query alone has saved our clients hundreds of hours. Often, a single query accounts for 40-60% of total database time -- and fixing it takes 20 minutes.
Indexing strategies that matter
Indexes are the highest-leverage performance tool in PostgreSQL. A missing index can make a query 1000x slower. But bad indexes waste disk space and slow down writes. Here is how to be strategic about it.
Find missing indexes:
-- Tables with the most sequential scans (potential missing indexes)
SELECT
schemaname,
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_tup_ins + n_tup_upd + n_tup_del AS writes
FROM pg_stat_user_tables
WHERE seq_scan > 1000
ORDER BY seq_tup_read DESC
LIMIT 20;
If a table has millions of sequential reads and few index scans, you almost certainly need an index on whatever columns your queries are filtering by.
Use the right index type:
- B-tree (default): Great for equality and range queries. Use this 90% of the time.
- GIN: For full-text search, JSONB fields, and array columns. Essential if you are querying inside JSON documents.
- BRIN: For large tables with naturally ordered data (timestamps, sequential IDs). Much smaller than B-tree with good performance.
- Partial indexes: Only index rows that matter.
-- Composite index for common query pattern
CREATE INDEX CONCURRENTLY idx_orders_customer_status
ON orders (customer_id, status)
WHERE status != 'archived';
-- GIN index for JSONB queries
CREATE INDEX CONCURRENTLY idx_events_metadata
ON events USING GIN (metadata);
-- BRIN index for time-series data
CREATE INDEX CONCURRENTLY idx_logs_created_at
ON application_logs USING BRIN (created_at);
Always use CONCURRENTLY when creating indexes in production -- it takes longer but does not lock the table.
Connection pooling with PgBouncer
PostgreSQL creates a new process for every connection. At 100+ concurrent connections, the overhead of managing those processes becomes a bottleneck itself -- even if your queries are fast.
PgBouncer sits between your application and PostgreSQL, maintaining a pool of reusable connections. It is simple to set up and makes a dramatic difference.
;; pgbouncer.ini
[databases]
myapp = host=your-rds-endpoint.amazonaws.com port=5432 dbname=myapp
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
;; Connection pool settings
pool_mode = transaction
default_pool_size = 25
max_client_conn = 500
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
;; Timeouts
server_idle_timeout = 300
client_idle_timeout = 600
query_timeout = 30
The key setting is pool_mode = transaction. This returns connections to the pool after each transaction completes, giving you maximum sharing. Your application connects to PgBouncer on port 6432 instead of PostgreSQL on port 5432 -- no code changes needed.
With this configuration, 500 application connections share 25 actual PostgreSQL connections. We have seen this alone resolve "too many connections" errors and reduce query latency by 30-50% under load.
Read replicas
When your read traffic significantly outweighs your writes -- which is true for most web applications -- read replicas let you horizontally scale your read capacity.
On AWS RDS, creating a read replica is straightforward:
aws rds create-db-instance-read-replica \
--db-instance-identifier myapp-read-1 \
--source-db-instance-identifier myapp-primary \
--db-instance-class db.r6g.large \
--availability-zone ca-central-1b
The engineering challenge is routing queries correctly. Writes must go to the primary. Reads can go to replicas, but keep in mind that replication lag means replicas may be a few milliseconds behind.
// Simple read/write splitting in your application
import { Pool } from "pg";
const primaryPool = new Pool({
connectionString: process.env.DATABASE_PRIMARY_URL,
});
const replicaPool = new Pool({
connectionString: process.env.DATABASE_REPLICA_URL,
});
// Use primary for writes and reads that need consistency
async function writeQuery(sql: string, params: any[]) {
return primaryPool.query(sql, params);
}
// Use replica for read-heavy operations that tolerate slight lag
async function readQuery(sql: string, params: any[]) {
return replicaPool.query(sql, params);
}
// For reads that MUST see the latest writes, use primary
async function consistentRead(sql: string, params: any[]) {
return primaryPool.query(sql, params);
}
When to add read replicas:
- Your primary instance CPU is consistently above 60% and most load is from reads
- You have reporting or analytics queries that you want to isolate from your production workload
- You need geographic read performance (put a replica in a region closer to your users)
Table partitioning
When tables grow into hundreds of millions of rows, even well-indexed queries slow down because the index itself becomes enormous. Partitioning splits a large table into smaller physical chunks that PostgreSQL can scan independently.
Range partitioning is the most common pattern -- splitting by date:
-- Create a partitioned table
CREATE TABLE events (
id BIGSERIAL,
event_type TEXT NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
-- Indexes are created per partition
CREATE INDEX idx_events_2026_01_type ON events_2026_01 (event_type);
CREATE INDEX idx_events_2026_02_type ON events_2026_02 (event_type);
CREATE INDEX idx_events_2026_03_type ON events_2026_03 (event_type);
Now a query filtering on created_at >= '2026-03-01' only scans the March partition instead of the entire table. For a table with 500 million rows across two years, this can mean scanning 20 million rows instead of 500 million.
Automate partition creation. Do not create partitions by hand. Use pg_partman or a cron job that creates future partitions ahead of time:
-- Using pg_partman for automatic partition management
CREATE EXTENSION pg_partman;
SELECT partman.create_parent(
p_parent_table := 'public.events',
p_control := 'created_at',
p_type := 'range',
p_interval := '1 month',
p_premake := 3 -- Create 3 months of future partitions
);
Need help implementing this? Our team can help you put these practices into action.
Caching layers
Not every query needs to hit the database. For data that is read frequently and changes infrequently, caching can reduce database load by 80-95%.
Redis for hot queries:
import Redis from "ioredis";
const redis = new Redis(process.env.REDIS_URL);
async function getProductCatalog(categoryId: string) {
const cacheKey = `catalog:\${categoryId}`;
// Check cache first
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Cache miss -- query database
const products = await db.query(
"SELECT * FROM products WHERE category_id = $1 AND active = true ORDER BY name",
[categoryId]
);
// Cache for 5 minutes
await redis.setex(cacheKey, 300, JSON.stringify(products));
return products;
}
Materialized views for reporting:
If you have complex reporting queries that aggregate millions of rows, materialized views precompute the results:
-- Create a materialized view for daily revenue reporting
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
date_trunc('day', created_at) AS day,
product_id,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY date_trunc('day', created_at), product_id;
-- Create an index on the materialized view
CREATE UNIQUE INDEX idx_mv_daily_revenue
ON mv_daily_revenue (day, product_id);
-- Refresh on a schedule (e.g., every hour via pg_cron)
SELECT cron.schedule(
'refresh-daily-revenue',
'0 * * * *',
'REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue'
);
The reporting dashboard now queries the materialized view (milliseconds) instead of aggregating the orders table (seconds to minutes).
VACUUM and autovacuum tuning
PostgreSQL uses MVCC (Multi-Version Concurrency Control), which means updates and deletes do not remove old row versions immediately -- they create new versions and mark old ones as dead. VACUUM reclaims that space.
If autovacuum falls behind on high-write tables, two things happen: table bloat eats disk space, and query performance degrades because the planner has stale statistics.
-- Check for tables where autovacuum is falling behind
SELECT
schemaname,
relname AS table_name,
n_dead_tup,
n_live_tup,
round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
For high-write tables, tune autovacuum to be more aggressive:
-- Make autovacuum more aggressive on a specific table
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.05, -- default 0.2
autovacuum_analyze_scale_factor = 0.02, -- default 0.1
autovacuum_vacuum_cost_delay = 2, -- default 2ms (in ms)
autovacuum_vacuum_cost_limit = 1000 -- default 200
);
This tells PostgreSQL to vacuum the orders table when 5% of rows are dead (instead of the default 20%) and to work faster when it does. For tables with millions of rows, this prevents the situation where autovacuum never catches up.
When to consider sharding (and why you probably do not need it)
Sharding -- distributing data across multiple PostgreSQL instances -- is the nuclear option. It adds enormous complexity to your application: cross-shard queries, distributed transactions, rebalancing, and operational overhead.
You probably do not need sharding if:
- Your database is under 1TB
- Your write throughput is under 10,000 transactions per second
- You have not exhausted indexing, connection pooling, read replicas, partitioning, and caching
Consider sharding when:
- Single-instance write throughput is genuinely the bottleneck after all other optimizations
- Your data has a natural partition key (tenant ID for multi-tenant SaaS, region for geo-distributed data)
- You are ready to invest in the operational complexity
If you do need it, look at Citus (PostgreSQL extension for distributed tables) before building custom sharding logic. It handles the hard parts -- distributed queries, rebalancing, and schema changes -- while keeping the PostgreSQL interface your team already knows.
Real-world results: the Northvex case
When we worked with Northvex, a Waterloo-based B2B SaaS platform with 400+ customers, their Monday morning peak loads were crushing their database. Every legal and accounting firm on the platform logged in simultaneously at 9 AM Eastern.
The fix was not a bigger instance. It was a combination of the patterns in this article: we added missing indexes on their most common query patterns, set up PgBouncer for connection pooling, introduced Redis caching for their dashboard queries, and created materialized views for their reporting aggregations. The result: Monday morning peak loads became 3x faster, and their RDS instance actually went down a size because it was no longer doing unnecessary work.
The bottom line
PostgreSQL can handle far more than most teams think. Before you reach for distributed databases, managed scaling services, or expensive infrastructure upgrades, work through this checklist in order:
- Enable
pg_stat_statementsand find your slow queries - Add proper indexes (composite, partial, GIN where needed)
- Set up connection pooling with PgBouncer
- Add read replicas if read traffic dominates
- Partition large tables by date or tenant
- Add Redis caching for hot queries and materialized views for reporting
- Tune autovacuum for high-write tables
Each step is a multiplier. Combined, they routinely give us 10-50x performance improvements on the same hardware. Only after exhausting all of these should you start thinking about sharding or alternative databases. The boring optimization work is almost always the right answer.