The rock-solid, open-source relational database. ACID-compliant, extensible, and trusted by millions of developers for over 35 years.
The essential PostgreSQL commands at a glance. Copy, paste, and query.
psql -h localhost -p 5432 -U user -d dbname
psql "postgresql://user:pass@host:5432/db"
psql -h host -U user -d db -c "SELECT 1"
CREATE DATABASE mydb;
DROP DATABASE mydb;
ALTER DATABASE mydb RENAME TO newdb;
ALTER DATABASE mydb OWNER TO newowner;
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email TEXT UNIQUE,
created_at TIMESTAMPTZ DEFAULT NOW()
);
INSERT INTO users (name, email)
VALUES ('Alice', 'alice@example.com');
SELECT * FROM users WHERE id = 1;
UPDATE users SET name = 'Bob' WHERE id = 2;
DELETE FROM users WHERE id = 3;
BEGIN;
UPDATE accounts SET balance = balance - 100
WHERE id = 1;
UPDATE accounts SET balance = balance + 100
WHERE id = 2;
COMMIT;
CREATE ROLE dev WITH LOGIN PASSWORD 'pass';
GRANT ALL ON users TO dev;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO reader;
REVOKE DELETE ON users FROM dev;
Getting PostgreSQL running on your system.
# macOS (Homebrew)
brew install postgresql@17
brew services start postgresql@17
# Ubuntu / Debian
sudo apt update
sudo apt install postgresql postgresql-contrib
sudo systemctl start postgresql
sudo systemctl enable postgresql
# Fedora / RHEL
sudo dnf install postgresql-server postgresql-contrib
sudo postgresql-setup --initdb
sudo systemctl start postgresql
# Arch Linux
sudo pacman -S postgresql
sudo -u postgres initdb -D /var/lib/postgres/data
sudo systemctl start postgresql
# Windows (Chocolatey)
choco install postgresql17
# Docker (quickest way to try it)
docker run --name pg17 -e POSTGRES_PASSWORD=secret \
-p 5432:5432 -d postgres:17
# Switch to the postgres system user
sudo -u postgres psql
# Create a new database and user
CREATE USER myapp WITH PASSWORD 'secure_password';
CREATE DATABASE myapp_db OWNER myapp;
# Grant privileges
GRANT ALL PRIVILEGES ON DATABASE myapp_db TO myapp;
# Connect to the new database
\c myapp_db
# Check version
SELECT version();
| File | Purpose | Typical Location |
|---|---|---|
postgresql.conf |
Main server configuration (memory, connections, logging) | /etc/postgresql/17/main/ |
pg_hba.conf |
Host-based authentication rules (who can connect) | /etc/postgresql/17/main/ |
pg_ident.conf |
OS user to PostgreSQL user mappings | /etc/postgresql/17/main/ |
SHOW config_file; inside psql to find the exact path of your postgresql.conf, or use pg_config --sysconfdir from the shell.
# TYPE DATABASE USER ADDRESS METHOD
# Local connections via Unix socket
local all all peer
# IPv4 local connections with password
host all all 127.0.0.1/32 scram-sha-256
# IPv6 local connections
host all all ::1/128 scram-sha-256
# Allow connections from a subnet
host myapp_db myapp 10.0.0.0/24 scram-sha-256
# Reject everything else (implicit default)
trust authentication in production. Always prefer scram-sha-256 (the default since PG 14) over the older md5 method.
DDL, DML, transactions, constraints, and the ACID guarantee that makes PostgreSQL rock-solid.
-- Create a table with various constraints
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
status VARCHAR(20) CHECK (status IN ('pending', 'shipped', 'delivered', 'cancelled')),
total NUMERIC(10,2) NOT NULL CHECK (total >= 0),
notes TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Add a column
ALTER TABLE orders ADD COLUMN tracking_number VARCHAR(50);
-- Add a constraint after the fact
ALTER TABLE orders ADD CONSTRAINT orders_total_positive CHECK (total > 0);
-- Create an index
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status) WHERE status != 'cancelled';
-- Rename and drop
ALTER TABLE orders RENAME COLUMN notes TO order_notes;
DROP TABLE IF EXISTS orders CASCADE;
| Level | Dirty Read | Non-repeatable Read | Phantom Read | Use Case |
|---|---|---|---|---|
READ UNCOMMITTED |
Prevented* | Possible | Possible | Maps to Read Committed in PG |
READ COMMITTED |
Prevented | Possible | Possible | Default -- good for most apps |
REPEATABLE READ |
Prevented | Prevented | Prevented* | Reporting queries, batch jobs |
SERIALIZABLE |
Prevented | Prevented | Prevented | Financial transactions, strict integrity |
-- Primary key
id SERIAL PRIMARY KEY
-- Composite primary key
PRIMARY KEY (order_id, product_id)
-- Foreign key with actions
user_id INTEGER REFERENCES users(id)
ON DELETE CASCADE
ON UPDATE SET NULL
-- Unique constraint (allows one NULL by default)
email TEXT UNIQUE
-- Unique across multiple columns
UNIQUE (user_id, product_id)
-- NOT NULL
name VARCHAR(100) NOT NULL
-- Check constraint
age INTEGER CHECK (age >= 0 AND age <= 150)
-- Default value
status VARCHAR(20) DEFAULT 'active'
-- Generated column (PG 12+)
full_name TEXT GENERATED ALWAYS AS (first_name || ' ' || last_name) STORED
-- Exclusion constraint (prevent overlapping ranges)
EXCLUDE USING gist (room WITH =, during WITH &&)
-- Create a schema
CREATE SCHEMA analytics;
-- Create table in schema
CREATE TABLE analytics.events (
id BIGSERIAL PRIMARY KEY,
event_type TEXT NOT NULL,
payload JSONB,
recorded_at TIMESTAMPTZ DEFAULT NOW()
);
-- Set search path for session
SET search_path TO analytics, public;
-- Set default search path for a role
ALTER ROLE analyst SET search_path = analytics, public;
-- List all schemas
\dn
PostgreSQL's rich type system goes far beyond the basics. Every type is a mineral in the collection.
| Type | Size | Range | Notes |
|---|---|---|---|
SMALLINT |
2 bytes | -32,768 to 32,767 | Alias: INT2 |
INTEGER |
4 bytes | -2.1B to 2.1B | Most common. Alias: INT4 |
BIGINT |
8 bytes | -9.2 quintillion to 9.2 quintillion | Use for IDs in large tables |
NUMERIC(p,s) |
Variable | Up to 131,072 digits before decimal | Exact. Use for money/financial data |
REAL |
4 bytes | 6 decimal digits precision | Alias: FLOAT4 |
DOUBLE PRECISION |
8 bytes | 15 decimal digits precision | Alias: FLOAT8 |
SERIAL |
4 bytes | 1 to 2.1B | Auto-increment. Prefer GENERATED ALWAYS AS IDENTITY |
BIGSERIAL |
8 bytes | 1 to 9.2 quintillion | Auto-increment for large tables |
| Type | Max Length | Notes |
|---|---|---|
TEXT |
Unlimited | Preferred for most string data. No performance penalty vs VARCHAR. |
VARCHAR(n) |
n characters | Use when you need an explicit length constraint. |
CHAR(n) |
n characters (padded) | Right-padded with spaces. Rarely needed. |
TEXT and VARCHAR have identical performance. Use TEXT unless you specifically need a length constraint. There is no performance benefit to VARCHAR(255) -- that's a MySQL habit.
| Type | Size | Description |
|---|---|---|
DATE |
4 bytes | Calendar date (year, month, day) |
TIME |
8 bytes | Time of day (no timezone) |
TIMESTAMP |
8 bytes | Date + time without timezone |
TIMESTAMPTZ |
8 bytes | Always use this. Stores as UTC, displays in session TZ |
INTERVAL |
16 bytes | Time spans: '1 year 2 months 3 days' |
TSTZRANGE |
Variable | Range of timestamptz values. Great for scheduling. |
-- Always use TIMESTAMPTZ
SELECT NOW(); -- 2026-03-19 14:30:00+00
SELECT NOW() AT TIME ZONE 'US/Pacific'; -- 2026-03-19 06:30:00
SELECT '2026-01-01'::DATE + '3 months'::INTERVAL; -- 2026-04-01
-- Date truncation
SELECT date_trunc('month', NOW()); -- First of current month
SELECT date_trunc('week', NOW()); -- Monday of current week
-- Generate a time series
SELECT generate_series(
'2026-01-01'::DATE,
'2026-12-31'::DATE,
'1 month'::INTERVAL
);
| Type | Description | Example |
|---|---|---|
UUID |
128-bit universally unique ID | gen_random_uuid() |
JSONB |
Binary JSON (indexable, queryable) | '{"key": "val"}'::jsonb |
ARRAY |
Arrays of any type | '{1,2,3}'::int[] |
HSTORE |
Key-value pairs (extension) | '"a"=>"1","b"=>"2"'::hstore |
INET / CIDR |
IP addresses and networks | '192.168.1.0/24'::cidr |
TSVECTOR |
Full-text search document | to_tsvector('english', 'hello world') |
TSQUERY |
Full-text search query | to_tsquery('english', 'hello & world') |
INT4RANGE |
Range of integers | '[1,10)'::int4range |
BYTEA |
Binary data | '\xDEADBEEF'::bytea |
BOOLEAN |
true/false/null | TRUE, FALSE, NULL |
ENUM |
Custom enumerated types | CREATE TYPE mood AS ENUM ('happy', 'sad') |
-- Declare an array column
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title TEXT,
tags TEXT[]
);
-- Insert with array literal
INSERT INTO posts (title, tags)
VALUES ('Intro to PG', '{postgresql, database, tutorial}');
-- Insert with ARRAY constructor
INSERT INTO posts (title, tags)
VALUES ('Advanced SQL', ARRAY['sql', 'advanced', 'postgresql']);
-- Query: contains element
SELECT * FROM posts WHERE 'postgresql' = ANY(tags);
-- Query: contains all of these
SELECT * FROM posts WHERE tags @> ARRAY['sql', 'postgresql'];
-- Unnest array to rows
SELECT id, unnest(tags) AS tag FROM posts;
-- Aggregate rows into array
SELECT array_agg(title) FROM posts WHERE 'postgresql' = ANY(tags);
CTEs, window functions, lateral joins, full-text search, and recursive queries -- the power tools of SQL.
-- Basic CTE
WITH active_users AS (
SELECT id, name, email
FROM users
WHERE last_login > NOW() - '30 days'::INTERVAL
)
SELECT au.name, COUNT(o.id) AS order_count
FROM active_users au
JOIN orders o ON o.user_id = au.id
GROUP BY au.name
ORDER BY order_count DESC;
-- Recursive CTE: employee hierarchy
WITH RECURSIVE org_chart AS (
-- Base case: top-level managers
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: find reports
SELECT e.id, e.name, e.manager_id, oc.depth + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT REPEAT(' ', depth - 1) || name AS org_tree, depth
FROM org_chart
ORDER BY depth, name;
-- CTE with INSERT (writable CTE)
WITH moved_rows AS (
DELETE FROM orders
WHERE status = 'archived'
RETURNING *
)
INSERT INTO orders_archive
SELECT * FROM moved_rows;
-- ROW_NUMBER: rank items within groups
SELECT
department,
name,
salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rank
FROM employees;
-- Running total
SELECT
date,
revenue,
SUM(revenue) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM daily_revenue;
-- LAG / LEAD: access previous/next rows
SELECT
date,
revenue,
LAG(revenue, 1) OVER (ORDER BY date) AS prev_day,
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS day_over_day
FROM daily_revenue;
-- NTILE: divide rows into buckets
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;
-- PERCENT_RANK and CUME_DIST
SELECT
name,
salary,
ROUND(PERCENT_RANK() OVER (ORDER BY salary)::numeric, 2) AS pct_rank,
ROUND(CUME_DIST() OVER (ORDER BY salary)::numeric, 2) AS cume_dist
FROM employees;
GROUP BY reduces many rows to one; window functions preserve every row while adding aggregate calculations alongside.
-- Get the 3 most recent orders for each user
SELECT u.name, o.id, o.total, o.created_at
FROM users u
CROSS JOIN LATERAL (
SELECT id, total, created_at
FROM orders
WHERE user_id = u.id
ORDER BY created_at DESC
LIMIT 3
) o;
-- LATERAL with aggregation
SELECT
p.name AS product,
stats.avg_rating,
stats.review_count
FROM products p
LEFT JOIN LATERAL (
SELECT
AVG(rating) AS avg_rating,
COUNT(*) AS review_count
FROM reviews r
WHERE r.product_id = p.id
AND r.created_at > NOW() - '90 days'::INTERVAL
) stats ON TRUE;
LATERAL when you need a subquery in the FROM clause to reference columns from preceding tables. It's the SQL equivalent of a "for each" loop -- perfect for "top N per group" queries.
-- Create a table with full-text search
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
tsv TSVECTOR -- pre-computed search vector
);
-- Populate the tsvector column
UPDATE articles SET tsv =
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'B');
-- Create a GIN index for fast search
CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv);
-- Search with ranking
SELECT
title,
ts_rank(tsv, query) AS rank
FROM articles,
to_tsquery('english', 'postgresql & performance') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 10;
-- Search with headline (snippet extraction)
SELECT
title,
ts_headline('english', body,
to_tsquery('english', 'postgresql & performance'),
'StartSel=<b>, StopSel=</b>, MaxFragments=2'
) AS snippet
FROM articles
WHERE tsv @@ to_tsquery('english', 'postgresql & performance');
-- Auto-update tsvector with a trigger
CREATE FUNCTION articles_tsv_trigger() RETURNS TRIGGER AS $$
BEGIN
NEW.tsv :=
setweight(to_tsvector('english', COALESCE(NEW.title, '')), 'A') ||
setweight(to_tsvector('english', COALESCE(NEW.body, '')), 'B');
RETURN NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER tsv_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_tsv_trigger();
-- UPSERT (INSERT ... ON CONFLICT)
INSERT INTO user_settings (user_id, key, value)
VALUES (1, 'theme', 'dark')
ON CONFLICT (user_id, key)
DO UPDATE SET
value = EXCLUDED.value,
updated_at = NOW();
-- DISTINCT ON (first row per group)
SELECT DISTINCT ON (user_id)
user_id, created_at, amount
FROM payments
ORDER BY user_id, created_at DESC;
-- FILTER clause (conditional aggregation)
SELECT
COUNT(*) AS total_orders,
COUNT(*) FILTER (WHERE status = 'delivered') AS delivered,
COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
SUM(total) FILTER (WHERE status = 'delivered') AS delivered_revenue
FROM orders
WHERE created_at >= '2026-01-01';
-- GROUPING SETS (multiple aggregation levels)
SELECT
COALESCE(region, 'ALL') AS region,
COALESCE(product, 'ALL') AS product,
SUM(revenue) AS total_revenue
FROM sales
GROUP BY GROUPING SETS (
(region, product),
(region),
(product),
()
);
The right index can make a query 1000x faster. The wrong index wastes space and slows writes. Know your facets.
| Index Type | Best For | Operators | Size |
|---|---|---|---|
B-tree |
Equality, range, sorting (default) | = < > <= >= BETWEEN IN |
Medium |
Hash |
Equality only (rarely used) | = |
Small |
GIN |
Arrays, JSONB, full-text search | @> <@ @@ ? ?| ?& |
Large |
GiST |
Geometry, ranges, nearest-neighbor | && @> <@ <-> |
Medium |
SP-GiST |
Non-balanced structures (quad/k-d trees) | << >> ~= <@ |
Medium |
BRIN |
Very large, naturally ordered tables | = < > <= >= |
Tiny |
-- Standard B-tree index (most common)
CREATE INDEX idx_users_email ON users(email);
-- Composite index (leftmost prefix rule applies)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
-- Index with INCLUDE (covering index, PG 11+)
CREATE INDEX idx_orders_covering ON orders(user_id)
INCLUDE (total, status); -- Index-only scan for these columns
(user_id, created_at), the index supports queries on user_id alone, or user_id + created_at together, but NOT created_at alone. Put the most selective (most filtered) column first.
-- GIN for JSONB
CREATE INDEX idx_events_payload ON events USING GIN(payload);
-- GIN for jsonb_path_ops (faster @> but only supports containment)
CREATE INDEX idx_events_payload_ops
ON events USING GIN(payload jsonb_path_ops);
-- GIN for arrays
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
-- GIN for full-text search
CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv);
-- GIN for trigram similarity (pg_trgm extension)
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_name_trgm ON users USING GIN(name gin_trgm_ops);
-- Perfect for time-series data (physically ordered by date)
CREATE INDEX idx_logs_created_at
ON server_logs USING BRIN(created_at);
-- Tiny index: summarizes block ranges instead of individual rows
-- A 100GB table might have a 100KB BRIN index vs a 2GB B-tree
-- Custom pages_per_range (default 128)
CREATE INDEX idx_logs_brin
ON server_logs USING BRIN(created_at)
WITH (pages_per_range = 64);
-- Partial index: only index active orders
CREATE INDEX idx_orders_active
ON orders(created_at)
WHERE status IN ('pending', 'shipped');
-- Expression index: index on lower-cased email
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));
-- Expression index: index on JSONB field
CREATE INDEX idx_events_type
ON events((payload->>'type'));
-- Partial + Expression combo
CREATE INDEX idx_users_active_email
ON users(LOWER(email))
WHERE deleted_at IS NULL;
-- Create index without locking writes (takes longer)
CREATE INDEX CONCURRENTLY idx_orders_user
ON orders(user_id);
-- Check index size
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'orders';
-- Find unused indexes
SELECT
schemaname, tablename, indexname,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
-- Reindex (rebuild a bloated index)
REINDEX INDEX CONCURRENTLY idx_orders_user;
PostgreSQL's JSONB is a first-class binary JSON type with indexing, querying, and manipulation -- no document database needed.
| Feature | JSON |
JSONB |
|---|---|---|
| Storage | Plain text | Binary, decomposed |
| Duplicate keys | Preserved | Last value wins |
| Key order | Preserved | Not preserved |
| Indexing (GIN) | No | Yes |
| Containment (@>) | No | Yes |
| Write speed | Faster | Slightly slower |
| Read speed | Slower (reparsed) | Faster |
JSONB unless you specifically need to preserve key order or duplicate keys. JSONB supports indexing, containment checks, and partial updates.
-- Sample data
CREATE TABLE products (
id SERIAL PRIMARY KEY,
data JSONB NOT NULL
);
INSERT INTO products (data) VALUES
('{"name": "Widget", "price": 29.99, "tags": ["sale", "popular"],
"specs": {"weight": "100g", "color": "blue"}}');
-- Access operators
SELECT data->'name' AS json_value; -- "Widget" (as JSON)
SELECT data->>'name' AS text_value; -- Widget (as text)
SELECT data->'specs'->'color' AS nested_json; -- "blue"
SELECT data->'specs'->>'color' AS nested_text; -- blue
SELECT data#>'{specs,color}' AS path_json; -- "blue"
SELECT data#>>'{specs,color}' AS path_text; -- blue
-- Containment
SELECT * FROM products
WHERE data @> '{"tags": ["sale"]}'; -- Contains this structure
-- Key existence
SELECT * FROM products WHERE data ? 'specs'; -- Has key 'specs'
SELECT * FROM products WHERE data ?| ARRAY['name', 'sku']; -- Has any key
SELECT * FROM products WHERE data ?& ARRAY['name', 'price']; -- Has all keys
-- Set / overwrite a key
UPDATE products SET data = data || '{"on_sale": true}'::jsonb;
-- Set a nested value
UPDATE products SET data = jsonb_set(data, '{specs,weight}', '"150g"');
-- Remove a key
UPDATE products SET data = data - 'on_sale';
-- Remove a nested key
UPDATE products SET data = data #- '{specs,weight}';
-- Merge/concatenate
SELECT '{"a":1}'::jsonb || '{"b":2}'::jsonb; -- {"a":1, "b":2}
-- Iterate over JSONB object keys
SELECT key, value
FROM products, jsonb_each(data->'specs');
-- Iterate over JSONB array elements
SELECT jsonb_array_elements_text(data->'tags') AS tag
FROM products;
-- SQL/JSON JSON_TABLE: decompose JSON into relational rows
SELECT jt.*
FROM products,
JSON_TABLE(
data, '$'
COLUMNS (
product_name TEXT PATH '$.name',
product_price NUMERIC PATH '$.price',
NESTED PATH '$.tags[*]' COLUMNS (
tag TEXT PATH '$'
)
)
) AS jt;
EXPLAIN ANALYZE, memory tuning, connection pooling, vacuum strategies, and the art of making queries fly.
-- Basic explain
EXPLAIN SELECT * FROM orders WHERE user_id = 42;
-- ANALYZE: actually runs the query, shows real timings
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;
-- BUFFERS: shows cache hits vs disk reads
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 42;
-- FORMAT JSON: machine-readable output
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE user_id = 42;
-- PG 17: SERIALIZE and MEMORY options
EXPLAIN (ANALYZE, SERIALIZE, MEMORY)
SELECT * FROM orders WHERE user_id = 42;
ANALYZE tablename), and Buffers: shared read (data not cached).
| Scan Type | What It Means | Action |
|---|---|---|
Seq Scan |
Reads every row in the table | Add an index if table is large |
Index Scan |
Uses index, fetches from heap | Good for selective queries |
Index Only Scan |
All data from index (no heap) | Best possible -- use covering indexes |
Bitmap Index Scan |
Builds bitmap, then heap scan | Good for medium selectivity |
Hash Join |
Builds hash table from inner | Efficient for equality joins |
Merge Join |
Merges two sorted inputs | Efficient for large sorted datasets |
Nested Loop |
For each outer row, scan inner | Fine for small sets, bad for large |
-- postgresql.conf tuning guide
# Memory (most impactful settings)
shared_buffers = '4GB' # 25% of total RAM (start here)
effective_cache_size = '12GB' # 75% of total RAM (helps planner)
work_mem = '64MB' # Per-operation sort/hash memory
maintenance_work_mem = '1GB' # For VACUUM, CREATE INDEX, etc.
# WAL (Write-Ahead Log)
wal_buffers = '64MB' # -1 = auto (1/32 of shared_buffers)
min_wal_size = '1GB'
max_wal_size = '4GB'
checkpoint_completion_target = 0.9
# Connections
max_connections = 200 # Use PgBouncer for 1000+ clients
# Query Planner
random_page_cost = 1.1 # Lower for SSDs (default 4.0)
effective_io_concurrency = 200 # Higher for SSDs
# Parallel Queries (PG 10+)
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_parallel_maintenance_workers = 4
# Logging
log_min_duration_statement = 500 # Log queries slower than 500ms
log_statement = 'ddl' # Log all DDL statements
work_mem is allocated per-operation per-connection. A complex query with 5 sort operations and 100 connections could use 64MB x 5 x 100 = 32GB. Start low (32-64MB) and monitor.
# /etc/pgbouncer/pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
# Pool modes:
# session - connection per client session (safest)
# transaction - connection per transaction (recommended)
# statement - connection per statement (most aggressive)
pool_mode = transaction
# Pool sizing
default_pool_size = 25 # connections per user/db pair
max_client_conn = 1000 # max client connections
max_db_connections = 50 # max connections to PostgreSQL
reserve_pool_size = 5 # extra connections for burst
# Timeouts
server_idle_timeout = 600
client_idle_timeout = 0
query_timeout = 0
-- Manual vacuum (reclaims dead tuples)
VACUUM orders;
-- Verbose vacuum (shows details)
VACUUM VERBOSE orders;
-- Full vacuum (rewrites table, locks it -- rarely needed)
VACUUM FULL orders;
-- Analyze (update planner statistics)
ANALYZE orders;
-- Vacuum + Analyze together
VACUUM ANALYZE orders;
-- Check dead tuple count
SELECT
relname,
n_live_tup,
n_dead_tup,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 1) AS dead_pct,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
-- Autovacuum tuning in postgresql.conf
autovacuum = on # Never disable this
autovacuum_max_workers = 5 # Default 3, increase for many tables
autovacuum_vacuum_threshold = 50 # Min dead rows before vacuum
autovacuum_vacuum_scale_factor = 0.05 # Default 0.2, lower = more frequent
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.05
-- Per-table autovacuum settings (for hot tables)
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_delay = 10
);
-- Find slow queries (requires pg_stat_statements)
SELECT
ROUND(total_exec_time::numeric, 2) AS total_ms,
calls,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
ROUND(rows::numeric / NULLIF(calls, 0), 0) AS avg_rows,
LEFT(query, 80) AS query_preview
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Currently running queries
SELECT
pid,
NOW() - query_start AS duration,
state,
LEFT(query, 80) AS query
FROM pg_stat_activity
WHERE state != 'idle'
AND pid != pg_backend_pid()
ORDER BY duration DESC;
-- Lock contention
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.query AS blocked_query,
blocking_locks.pid AS blocking_pid,
blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
-- Table bloat estimation
SELECT
schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_table_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;
Extensions transform PostgreSQL from a relational database into a universal data platform. Each one is a geode waiting to be cracked open.
| Extension | Category | Description |
|---|---|---|
pg_stat_statements |
monitoring | Tracks execution statistics for all SQL statements. Must-have. |
pgvector |
AI/ML | Vector similarity search for embeddings, RAG, and AI-powered apps. |
PostGIS |
spatial | Geometry, geography, spatial indexing. The gold standard for geospatial. |
pg_trgm |
text | Trigram matching for fuzzy search and LIKE/ILIKE acceleration. |
uuid-ossp |
utility | UUID generation functions. (PG 13+ has gen_random_uuid() built-in.) |
hstore |
utility | Key-value pairs in a single column. Lighter than JSONB for simple maps. |
pg_cron |
scheduling | Cron-based job scheduling directly inside PostgreSQL. |
pgcrypto |
security | Cryptographic functions: hashing, encryption, random bytes. |
TimescaleDB |
timeseries | Optimized time-series storage with auto-partitioning. |
Citus |
distributed | Horizontal sharding and distributed queries. |
-- Enable in postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
-- Then restart PostgreSQL and create extension
CREATE EXTENSION pg_stat_statements;
-- Top queries by total time
SELECT
LEFT(query, 60) AS query,
calls,
ROUND(total_exec_time::numeric / 1000, 2) AS total_sec,
ROUND(mean_exec_time::numeric, 2) AS avg_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Reset statistics
SELECT pg_stat_statements_reset();
CREATE EXTENSION vector;
-- Store embeddings alongside data
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536) -- OpenAI ada-002 dimension
);
-- Create an HNSW index (faster queries, more memory)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
-- Find similar documents (cosine distance)
SELECT id, content,
1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;
-- L2 distance (Euclidean)
SELECT * FROM documents
ORDER BY embedding <-> '[0.1, 0.2, ...]'::vector
LIMIT 10;
-- Inner product (for normalized vectors)
SELECT * FROM documents
ORDER BY embedding <#> '[0.1, 0.2, ...]'::vector
LIMIT 10;
CREATE EXTENSION pg_trgm;
-- Create a trigram index for fuzzy matching
CREATE INDEX idx_users_name_trgm
ON users USING GIN(name gin_trgm_ops);
-- Fuzzy search with similarity threshold
SELECT name, similarity(name, 'Jon Smith') AS sim
FROM users
WHERE name % 'Jon Smith' -- % operator uses similarity threshold
ORDER BY sim DESC
LIMIT 10;
-- Accelerate LIKE queries
SELECT * FROM products
WHERE name ILIKE '%widget%'; -- GIN trgm index speeds this up
-- Set similarity threshold (default 0.3)
SET pg_trgm.similarity_threshold = 0.4;
-- List installed extensions
SELECT * FROM pg_extension;
-- List available extensions
SELECT * FROM pg_available_extensions
WHERE installed_version IS NULL
ORDER BY name;
-- Install / Upgrade / Remove
CREATE EXTENSION pg_trgm;
ALTER EXTENSION pg_trgm UPDATE;
DROP EXTENSION pg_trgm;
Streaming replication, logical replication, failover strategies, and the tools that keep databases running through earthquakes.
-- PRIMARY: postgresql.conf
wal_level = replica
max_wal_senders = 5
wal_keep_size = '1GB'
synchronous_commit = on # 'remote_apply' for sync replication
-- PRIMARY: pg_hba.conf (allow replication connections)
host replication replicator 10.0.0.0/24 scram-sha-256
-- PRIMARY: create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';
-- STANDBY: create base backup
pg_basebackup -h primary-host -U replicator -D /var/lib/postgresql/17/main \
--wal-method=stream --checkpoint=fast --progress
-- STANDBY: signal file (or set in postgresql.conf)
touch /var/lib/postgresql/17/main/standby.signal
-- STANDBY: postgresql.conf
primary_conninfo = 'host=primary-host user=replicator password=secret'
hot_standby = on
-- Verify replication on PRIMARY
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;
-- PUBLISHER (source): postgresql.conf
wal_level = logical
-- PUBLISHER: create publication
CREATE PUBLICATION my_pub FOR TABLE orders, users;
-- Or publish all tables:
CREATE PUBLICATION all_pub FOR ALL TABLES;
-- SUBSCRIBER (target): create matching tables first, then subscribe
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=publisher-host dbname=mydb user=replicator password=secret'
PUBLICATION my_pub;
-- Monitor replication lag
SELECT
subname,
received_lsn,
latest_end_lsn,
latest_end_time
FROM pg_stat_subscription;
-- PG 17: failover logical slots (preserved after failover)
-- Set on subscriber:
ALTER SUBSCRIPTION my_sub SET (failover = true);
| Tool | Role | Description |
|---|---|---|
Patroni |
Cluster manager | Automated failover, uses etcd/ZooKeeper/Consul for consensus |
repmgr |
Replication manager | Simpler alternative to Patroni for smaller setups |
PgBouncer |
Connection pooler | Lightweight pooling, sits between app and PG |
HAProxy |
Load balancer | Routes reads to replicas, writes to primary |
pgBackRest |
Backup | Incremental, parallel backups with WAL archiving |
pg_basebackup |
Base backup | Built-in tool for physical backups (PG 17: incremental) |
The psql interactive terminal is your primary excavation tool. Master these meta-commands and you'll never need a GUI.
# Connect with individual flags
psql -h hostname -p 5432 -U username -d database
# Connection URI
psql "postgresql://user:password@host:5432/dbname?sslmode=require"
# Connect with environment variables
export PGHOST=localhost
export PGPORT=5432
export PGUSER=myuser
export PGDATABASE=mydb
export PGPASSWORD=secret # Or use ~/.pgpass file
psql
# Run a single command
psql -c "SELECT NOW();"
# Run a SQL file
psql -f schema.sql
\l List all databases
\c dbname Connect to database
\dn List schemas
\dt List tables
\dt+ List tables with sizes
\dv List views
\di List indexes
\ds List sequences
\df List functions
\d tablename Describe table structure
\d+ tablename Extended table info (size, desc)
\di+ tablename Index details for table
\sf funcname Show function definition
\sv viewname Show view definition
\du List roles/users
\dp tablename Show permissions
\x Toggle expanded display
\x auto Auto-switch based on width
\timing Toggle query timing
\pset border 2 Add table borders
\pset format csv CSV output
\pset null NULL Show NULL explicitly
\o file.txt Send output to file
\o Reset output to terminal
\e Edit in $EDITOR
\ef funcname Edit function in $EDITOR
\i file.sql Execute SQL file
\ir file.sql Execute file (relative path)
\g Re-run last query
\watch 5 Re-run query every 5 seconds
\q Quit psql
\? Show all meta-commands
\copy users TO 'users.csv' CSV HEADER
\copy users FROM 'users.csv' CSV HEADER
\copy (SELECT * FROM users WHERE active)
TO 'active.csv' CSV HEADER
\conninfo Show connection details
\encoding Show client encoding
\password Set password for user
\echo message Print to output
\! ls -la Run shell command
-- ~/.psqlrc -- loaded every time psql starts
-- Better NULL display
\pset null '[NULL]'
-- Always show timing
\timing on
-- Auto-switch expanded display
\x auto
-- Verbose error messages
\set VERBOSITY verbose
-- Colored prompt: [user@db] #
\set PROMPT1 '%[%033[1;36m%]%n@%/%[%033[0m%] %# '
-- History settings
\set HISTSIZE 10000
\set HISTCONTROL ignoredups
-- Useful shortcuts
\set active 'SELECT pid, now() - query_start AS duration, state, left(query, 60) FROM pg_stat_activity WHERE state != \'idle\' ORDER BY duration DESC;'
\set locks 'SELECT pid, mode, relation::regclass, granted FROM pg_locks WHERE NOT granted;'
\set sizes 'SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::regclass)) AS size FROM pg_tables WHERE schemaname = \'public\' ORDER BY pg_total_relation_size(tablename::regclass) DESC;'
-- Usage: type :active to run the shortcut
The latest eruptions in the PostgreSQL landscape. New features that change how you work with the database.
-- COPY with ON_ERROR (skip bad rows)
COPY users FROM '/tmp/users.csv'
WITH (FORMAT csv, HEADER true, ON_ERROR stop);
-- MERGE with RETURNING (PG 17)
MERGE INTO inventory AS target
USING new_stock AS source
ON target.sku = source.sku
WHEN MATCHED THEN
UPDATE SET quantity = target.quantity + source.quantity
WHEN NOT MATCHED THEN
INSERT (sku, name, quantity)
VALUES (source.sku, source.name, source.quantity)
RETURNING merge_action(), target.*;
-- EXPLAIN with SERIALIZE and MEMORY (PG 17)
EXPLAIN (ANALYZE, BUFFERS, SERIALIZE, MEMORY)
SELECT * FROM orders
WHERE created_at > NOW() - '7 days'::INTERVAL;
-- Monitoring: pg_wait_events view (PG 17)
SELECT * FROM pg_wait_events
WHERE type = 'IO';
| Feature | Description |
|---|---|
Logical Replication from Standby |
Subscribers can replicate from a standby server, reducing primary load |
Parallel FULL OUTER JOIN |
Hash full joins can now run in parallel, improving join performance |
SQL/JSON Constructors |
JSON_ARRAY(), JSON_OBJECT(), JSON_ARRAYAGG(), IS JSON predicate |
GRANT on Roles |
More granular predefined roles: pg_create_subscription, pg_use_reserved_connections |
pg_stat_io |
New view for detailed I/O statistics per backend type |
Bulk Loading |
Faster COPY loading with improved buffer management |
ANY_VALUE() |
Aggregate function that returns any value from the group (avoids needing it in GROUP BY) |
-- PG 16: SQL/JSON constructors
SELECT
JSON_OBJECT(
'id': id,
'name': name,
'email': email
) AS user_json
FROM users
WHERE active = TRUE;
-- PG 16: JSON_ARRAYAGG
SELECT
department,
JSON_ARRAYAGG(name ORDER BY name) AS members
FROM employees
GROUP BY department;
-- PG 16: IS JSON predicate
SELECT * FROM raw_data
WHERE payload IS JSON;
-- PG 16: ANY_VALUE() aggregate
SELECT
department,
COUNT(*) AS emp_count,
ANY_VALUE(location) AS sample_location
FROM employees
GROUP BY department;
pg_upgrade --check before the actual upgrade.
# pg_upgrade example (in-place)
pg_upgrade \
--old-datadir /var/lib/postgresql/16/main \
--new-datadir /var/lib/postgresql/17/main \
--old-bindir /usr/lib/postgresql/16/bin \
--new-bindir /usr/lib/postgresql/17/bin \
--check # Dry run first
# After upgrade, update statistics
/usr/lib/postgresql/17/bin/vacuumdb --all --analyze-in-stages