PostgreSQL CTEs and Window Functions You Should Be Using

· 12 min read · Database & Architecture

Go beyond basic SELECT queries with Common Table Expressions and window functions — real patterns for analytics, pagination, and deduplication.

PostgreSQL CTEs and Window Functions You Should Be Using

Overview

Most application developers use PostgreSQL as a key-value store: SELECT * FROM users WHERE id = $1. The database can do far more. CTEs and window functions let you write queries that replace hundreds of lines of application code — and execute faster because the data never leaves the database.

Problem

You are building a dashboard. You need:

  • The latest post per category
  • A running total of page views per day
  • Deduplicated results ranked by relevance
  • Pagination with total count without a second query

You could fetch all data and process it in Python. But that moves megabytes of data across the network, consumes application memory, and is slower than letting PostgreSQL handle it.

Solution

Common Table Expressions (CTEs)

A CTE is a named temporary result set that you can reference in your main query. Think of it as a variable for SQL.

Basic CTE: Breaking Complex Queries into Steps

WITH published_posts AS (
  SELECT id, title, slug, category, created_at, read_time
  FROM posts
  WHERE status = 'published'
),
category_stats AS (
  SELECT
    category,
    COUNT(*) AS post_count,
    AVG(read_time) AS avg_read_time
  FROM published_posts
  GROUP BY category
)
SELECT
  p.title,
  p.slug,
  p.category,
  cs.post_count AS posts_in_category,
  cs.avg_read_time
FROM published_posts p
JOIN category_stats cs ON cs.category = p.category
ORDER BY p.created_at DESC;

Each CTE is a building block. The query reads top-to-bottom like a program.

Recursive CTE: Hierarchical Data

For nested comments, org charts, or category trees:

WITH RECURSIVE comment_tree AS (
  -- Base case: top-level comments
  SELECT id, content, parent_id, author_id, 0 AS depth
  FROM comments
  WHERE parent_id IS NULL AND post_id = $1
  
  UNION ALL
  
  -- Recursive case: replies
  SELECT c.id, c.content, c.parent_id, c.author_id, ct.depth + 1
  FROM comments c
  JOIN comment_tree ct ON c.parent_id = ct.id
  WHERE ct.depth < 5  -- prevent infinite recursion
)
SELECT * FROM comment_tree
ORDER BY depth, id;

This replaces N+1 queries or loading all comments and building the tree in application code.

Window Functions

Window functions compute a value across a set of rows related to the current row — without collapsing them into a single output like GROUP BY does.

ROW_NUMBER: Get Latest per Group

'Get the most recent post per category':

SELECT title, slug, category, created_at
FROM (
  SELECT
    title, slug, category, created_at,
    ROW_NUMBER() OVER (
      PARTITION BY category
      ORDER BY created_at DESC
    ) AS rn
  FROM posts
  WHERE status = 'published'
) ranked
WHERE rn = 1;

PARTITION BY category creates a window per category. ROW_NUMBER() assigns 1, 2, 3... within each window. Filtering rn = 1 gives you the latest per group.

Running Totals with SUM OVER

'Daily page views with a 7-day running total':

SELECT
  date,
  daily_views,
  SUM(daily_views) OVER (
    ORDER BY date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7day_views
FROM (
  SELECT
    DATE(viewed_at) AS date,
    COUNT(*) AS daily_views
  FROM page_views
  WHERE viewed_at > NOW() - INTERVAL '30 days'
  GROUP BY DATE(viewed_at)
) daily
ORDER BY date;

This computes a sliding 7-day window without any application logic.

RANK vs DENSE_RANK vs ROW_NUMBER

SELECT
  title,
  read_time,
  ROW_NUMBER() OVER (ORDER BY read_time DESC) AS row_num,    -- 1, 2, 3, 4
  RANK()       OVER (ORDER BY read_time DESC) AS rank,        -- 1, 2, 2, 4 (skip)
  DENSE_RANK() OVER (ORDER BY read_time DESC) AS dense_rank   -- 1, 2, 2, 3 (no skip)
FROM posts
WHERE status = 'published';
  • ROW_NUMBER: Always unique. Use for pagination.
  • RANK: Ties get the same rank, but the next rank is skipped. Use for leaderboards.
  • DENSE_RANK: Ties get the same rank, no gap. Use for top-N categories.

Implementation

Efficient Keyset Pagination

Offset-based pagination (LIMIT 20 OFFSET 1000) scans and discards 1000 rows. Keyset pagination is O(1):

WITH page AS (
  SELECT id, title, slug, created_at
  FROM posts
  WHERE status = 'published'
    AND (created_at, id) < ($1, $2)  -- cursor from previous page
  ORDER BY created_at DESC, id DESC
  LIMIT 20
)
SELECT
  *,
  COUNT(*) OVER () AS total_in_page,
  (SELECT COUNT(*) FROM posts WHERE status = 'published') AS total_count
FROM page;

The COUNT(*) OVER () window function gives you the row count without a separate query.

Deduplication with CTEs

Remove duplicate blog entries (keeping the newest):

WITH duplicates AS (
  SELECT
    id,
    ROW_NUMBER() OVER (
      PARTITION BY slug
      ORDER BY created_at DESC
    ) AS rn
  FROM posts
)
DELETE FROM posts
WHERE id IN (
  SELECT id FROM duplicates WHERE rn > 1
);

Performance Tip

CTEs in PostgreSQL 12+ are automatically inlined (materialized only when referenced multiple times or containing side effects). For older versions, add MATERIALIZED or NOT MATERIALIZED hints explicitly.

Always check the query plan:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
WITH ... SELECT ...;

Conclusion

CTEs make complex queries readable. Window functions eliminate round trips between your app and database. ROW_NUMBER for latest-per-group, SUM OVER for running totals, keyset pagination for performance. Write the logic where the data lives.