How to Debug Slow APIs in Production Systems

January 28, 2026 · 12 min read · Backend Development

Systematic guide to debugging slow production APIs covering timing, query profiling, N+1 detection, connection pools, and distributed tracing.

How to Debug Slow APIs in Production Systems

An API that responds in 200ms during testing takes 3 seconds in production. Users complain, alerts fire, and you have no idea where the time goes. The response time is the sum of every operation the endpoint performs — database queries, external API calls, serialization, middleware — and any of them can be the bottleneck.

This guide covers systematic approaches to finding and fixing slow API endpoints. For Python-specific patterns, see the FastAPI performance guide. For database-specific issues, see PostgreSQL query optimization.

Problem

Slow APIs in production share common patterns:

Fast locally, slow in production (environment differences)
Slow intermittently (contention, cold starts, garbage collection)
Slow under load (connection pool exhaustion, thread starvation)
Gradually slowing over time (data growth, index degradation)

Step 1: Measure Where Time Goes

Add timing instrumentation to the request lifecycle:

import time
from fastapi import Request
import structlog

logger = structlog.get_logger()

@app.middleware("http")
async def timing_middleware(request: Request, call_next):
    start = time.perf_counter()
    response = await call_next(request)
    total = time.perf_counter() - start

    logger.info(
        "request_completed",
        method=request.method,
        path=request.url.path,
        status=response.status_code,
        duration_ms=round(total * 1000, 2),
    )

    if total > 1.0:
        logger.warning(
            "slow_request",
            method=request.method,
            path=request.url.path,
            duration_ms=round(total * 1000, 2),
        )

    return response

Step 2: Profile Database Queries

Database queries are the most common bottleneck. Enable pg_stat_statements:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries
SELECT query,
       round(mean_exec_time::numeric, 2) AS avg_ms,
       calls,
       round(total_exec_time::numeric, 2) AS total_ms
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Add per-query timing in the application:

async def timed_query(pool, query: str, *args):
    start = time.perf_counter()
    result = await pool.fetch(query, *args)
    elapsed = time.perf_counter() - start

    if elapsed > 0.1:
        logger.warning(
            "slow_query",
            query=query[:200],
            duration_ms=round(elapsed * 1000, 2),
            row_count=len(result),
        )

    return result

Step 3: Check N+1 Query Patterns

The most insidious performance killer. One query per item in a list:

# BAD: N+1 — one query per order
orders = await pool.fetch("SELECT * FROM orders LIMIT 100")
for order in orders:
    items = await pool.fetch(
        "SELECT * FROM order_items WHERE order_id = $1", order["id"]
    )
    # 1 + 100 = 101 queries

# GOOD: Two queries total
orders = await pool.fetch("SELECT * FROM orders LIMIT 100")
order_ids = [o["id"] for o in orders]
items = await pool.fetch(
    "SELECT * FROM order_items WHERE order_id = ANY($1)", order_ids
)
# 2 queries total, regardless of order count

Step 4: Profile External API Calls

External services add unpredictable latency:

import httpx

async def call_external_api(url: str, timeout: float = 5.0):
    start = time.perf_counter()
    try:
        async with httpx.AsyncClient(timeout=timeout) as client:
            response = await client.get(url)
            elapsed = time.perf_counter() - start

            logger.info(
                "external_api_call",
                url=url,
                status=response.status_code,
                duration_ms=round(elapsed * 1000, 2),
            )
            return response
    except httpx.TimeoutException:
        elapsed = time.perf_counter() - start
        logger.error(
            "external_api_timeout",
            url=url,
            duration_ms=round(elapsed * 1000, 2),
        )
        raise

Always set explicit timeouts. A hanging external call without a timeout will hold a worker indefinitely.

Step 5: Check Connection Pool Health

Connection pool exhaustion causes requests to queue:

# Monitor pool health
@app.get("/health/detailed")
async def detailed_health():
    return {
        "pool_size": pool.get_size(),
        "pool_idle": pool.get_idle_size(),
        "pool_min": pool.get_min_size(),
        "pool_max": pool.get_max_size(),
    }

If pool_idle is consistently 0, requests are waiting for connections. Increase max_size or optimize query duration.

Step 6: Distributed Tracing

For microservice architectures, use distributed tracing to follow a request across services:

from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

tracer = trace.get_tracer(__name__)

FastAPIInstrumentor.instrument_app(app)

@app.get("/orders/{order_id}")
async def get_order(order_id: str):
    with tracer.start_as_current_span("fetch_order"):
        order = await fetch_order(order_id)

    with tracer.start_as_current_span("fetch_items"):
        items = await fetch_order_items(order_id)

    with tracer.start_as_current_span("serialize"):
        return serialize_order(order, items)

Each span shows exactly how long that operation took. The trace shows the full request timeline across services.

Optimization Patterns

Parallel Independent Calls

import asyncio

# BAD: Sequential — total time = sum of all calls
user = await fetch_user(user_id)
orders = await fetch_orders(user_id)
prefs = await fetch_preferences(user_id)
# Total: 100ms + 80ms + 50ms = 230ms

# GOOD: Parallel — total time = max of all calls
user, orders, prefs = await asyncio.gather(
    fetch_user(user_id),
    fetch_orders(user_id),
    fetch_preferences(user_id),
)
# Total: max(100ms, 80ms, 50ms) = 100ms

Response Caching

from functools import lru_cache
from datetime import datetime, timedelta

cache: dict[str, tuple[any, datetime]] = {}

async def cached_fetch(key: str, fetcher, ttl_seconds: int = 60):
    if key in cache:
        value, expires = cache[key]
        if datetime.utcnow() < expires:
            return value

    value = await fetcher()
    cache[key] = (value, datetime.utcnow() + timedelta(seconds=ttl_seconds))
    return value

Common Mistakes

Mistake 1: Profiling in development only. Development databases have less data, no contention, and local latency. Always profile in production or staging with realistic data.

Mistake 2: Adding caching before understanding the bottleneck. Caching a fast query saves nothing. Profile first, cache the slow parts.

Mistake 3: No timeouts on external calls. One slow external dependency without a timeout can cascade into a full system outage.

Takeaways

Debugging slow APIs requires systematic measurement. Instrument request timing. Profile database queries. Find N+1 patterns. Time external API calls. Monitor connection pool health. Parallelize independent operations. The fix is usually in the data access layer, not the application logic.

How to Debug Slow APIs in Production Systems

Problem

Step 1: Measure Where Time Goes

Step 2: Profile Database Queries

Step 3: Check N+1 Query Patterns

Step 4: Profile External API Calls

Step 5: Check Connection Pool Health

Step 6: Distributed Tracing

Optimization Patterns

Parallel Independent Calls

Response Caching

Common Mistakes

Takeaways

Read Next