PostgreSQL CTEs and Window Functions You Should Be Using
Overview
Most application developers use PostgreSQL as a key-value store: SELECT * FROM users WHERE id = $1. The database can do far more. CTEs and window functions let you write queries that replace hundreds of lines of application code — and execute faster because the data never leaves the database.
Problem
You are building a dashboard. You need:
- The latest post per category
- A running total of page views per day
- Deduplicated results ranked by relevance
- Pagination with total count without a second query
You could fetch all data and process it in Python. But that moves megabytes of data across the network, consumes application memory, and is slower than letting PostgreSQL handle it.
Solution
Common Table Expressions (CTEs)
A CTE is a named temporary result set that you can reference in your main query. Think of it as a variable for SQL.
Basic CTE: Breaking Complex Queries into Steps
WITH published_posts AS (
SELECT id, title, slug, category, created_at, read_time
FROM posts
WHERE status = 'published'
),
category_stats AS (
SELECT
category,
COUNT(*) AS post_count,
AVG(read_time) AS avg_read_time
FROM published_posts
GROUP BY category
)
SELECT
p.title,
p.slug,
p.category,
cs.post_count AS posts_in_category,
cs.avg_read_time
FROM published_posts p
JOIN category_stats cs ON cs.category = p.category
ORDER BY p.created_at DESC;
Each CTE is a building block. The query reads top-to-bottom like a program.
Recursive CTE: Hierarchical Data
For nested comments, org charts, or category trees:
WITH RECURSIVE comment_tree AS (
-- Base case: top-level comments
SELECT id, content, parent_id, author_id, 0 AS depth
FROM comments
WHERE parent_id IS NULL AND post_id = $1
UNION ALL
-- Recursive case: replies
SELECT c.id, c.content, c.parent_id, c.author_id, ct.depth + 1
FROM comments c
JOIN comment_tree ct ON c.parent_id = ct.id
WHERE ct.depth < 5 -- prevent infinite recursion
)
SELECT * FROM comment_tree
ORDER BY depth, id;
This replaces N+1 queries or loading all comments and building the tree in application code.
Window Functions
Window functions compute a value across a set of rows related to the current row — without collapsing them into a single output like GROUP BY does.
ROW_NUMBER: Get Latest per Group
'Get the most recent post per category':
SELECT title, slug, category, created_at
FROM (
SELECT
title, slug, category, created_at,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY created_at DESC
) AS rn
FROM posts
WHERE status = 'published'
) ranked
WHERE rn = 1;
PARTITION BY category creates a window per category. ROW_NUMBER() assigns 1, 2, 3... within each window. Filtering rn = 1 gives you the latest per group.
Running Totals with SUM OVER
'Daily page views with a 7-day running total':
SELECT
date,
daily_views,
SUM(daily_views) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7day_views
FROM (
SELECT
DATE(viewed_at) AS date,
COUNT(*) AS daily_views
FROM page_views
WHERE viewed_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(viewed_at)
) daily
ORDER BY date;
This computes a sliding 7-day window without any application logic.
RANK vs DENSE_RANK vs ROW_NUMBER
SELECT
title,
read_time,
ROW_NUMBER() OVER (ORDER BY read_time DESC) AS row_num, -- 1, 2, 3, 4
RANK() OVER (ORDER BY read_time DESC) AS rank, -- 1, 2, 2, 4 (skip)
DENSE_RANK() OVER (ORDER BY read_time DESC) AS dense_rank -- 1, 2, 2, 3 (no skip)
FROM posts
WHERE status = 'published';
ROW_NUMBER: Always unique. Use for pagination.RANK: Ties get the same rank, but the next rank is skipped. Use for leaderboards.DENSE_RANK: Ties get the same rank, no gap. Use for top-N categories.
Implementation
Efficient Keyset Pagination
Offset-based pagination (LIMIT 20 OFFSET 1000) scans and discards 1000 rows. Keyset pagination is O(1):
WITH page AS (
SELECT id, title, slug, created_at
FROM posts
WHERE status = 'published'
AND (created_at, id) < ($1, $2) -- cursor from previous page
ORDER BY created_at DESC, id DESC
LIMIT 20
)
SELECT
*,
COUNT(*) OVER () AS total_in_page,
(SELECT COUNT(*) FROM posts WHERE status = 'published') AS total_count
FROM page;
The COUNT(*) OVER () window function gives you the row count without a separate query.
Deduplication with CTEs
Remove duplicate blog entries (keeping the newest):
WITH duplicates AS (
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY slug
ORDER BY created_at DESC
) AS rn
FROM posts
)
DELETE FROM posts
WHERE id IN (
SELECT id FROM duplicates WHERE rn > 1
);
Performance Tip
CTEs in PostgreSQL 12+ are automatically inlined (materialized only when referenced multiple times or containing side effects). For older versions, add MATERIALIZED or NOT MATERIALIZED hints explicitly.
Always check the query plan:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
WITH ... SELECT ...;
Conclusion
CTEs make complex queries readable. Window functions eliminate round trips between your app and database. ROW_NUMBER for latest-per-group, SUM OVER for running totals, keyset pagination for performance. Write the logic where the data lives.