How to Debug Slow APIs in Production Systems

· 12 min read · Backend Development

Systematic guide to debugging slow production APIs covering timing, query profiling, N+1 detection, connection pools, and distributed tracing.

How to Debug Slow APIs in Production Systems

An API that responds in 200ms during testing takes 3 seconds in production. Users complain, alerts fire, and you have no idea where the time goes. The response time is the sum of every operation the endpoint performs — database queries, external API calls, serialization, middleware — and any of them can be the bottleneck.

This guide covers systematic approaches to finding and fixing slow API endpoints. For Python-specific patterns, see the FastAPI performance guide. For database-specific issues, see PostgreSQL query optimization.

Problem

Slow APIs in production share common patterns:

  • Fast locally, slow in production (environment differences)
  • Slow intermittently (contention, cold starts, garbage collection)
  • Slow under load (connection pool exhaustion, thread starvation)
  • Gradually slowing over time (data growth, index degradation)

Step 1: Measure Where Time Goes

Add timing instrumentation to the request lifecycle:

import time
from fastapi import Request
import structlog

logger = structlog.get_logger()

@app.middleware("http")
async def timing_middleware(request: Request, call_next):
    start = time.perf_counter()
    response = await call_next(request)
    total = time.perf_counter() - start

    logger.info(
        "request_completed",
        method=request.method,
        path=request.url.path,
        status=response.status_code,
        duration_ms=round(total * 1000, 2),
    )

    if total > 1.0:
        logger.warning(
            "slow_request",
            method=request.method,
            path=request.url.path,
            duration_ms=round(total * 1000, 2),
        )

    return response

Step 2: Profile Database Queries

Database queries are the most common bottleneck. Enable pg_stat_statements:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries
SELECT query,
       round(mean_exec_time::numeric, 2) AS avg_ms,
       calls,
       round(total_exec_time::numeric, 2) AS total_ms
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Add per-query timing in the application:

async def timed_query(pool, query: str, *args):
    start = time.perf_counter()
    result = await pool.fetch(query, *args)
    elapsed = time.perf_counter() - start

    if elapsed > 0.1:
        logger.warning(
            "slow_query",
            query=query[:200],
            duration_ms=round(elapsed * 1000, 2),
            row_count=len(result),
        )

    return result

Step 3: Check N+1 Query Patterns

The most insidious performance killer. One query per item in a list:

# BAD: N+1 — one query per order
orders = await pool.fetch("SELECT * FROM orders LIMIT 100")
for order in orders:
    items = await pool.fetch(
        "SELECT * FROM order_items WHERE order_id = $1", order["id"]
    )
    # 1 + 100 = 101 queries

# GOOD: Two queries total
orders = await pool.fetch("SELECT * FROM orders LIMIT 100")
order_ids = [o["id"] for o in orders]
items = await pool.fetch(
    "SELECT * FROM order_items WHERE order_id = ANY($1)", order_ids
)
# 2 queries total, regardless of order count

Step 4: Profile External API Calls

External services add unpredictable latency:

import httpx

async def call_external_api(url: str, timeout: float = 5.0):
    start = time.perf_counter()
    try:
        async with httpx.AsyncClient(timeout=timeout) as client:
            response = await client.get(url)
            elapsed = time.perf_counter() - start

            logger.info(
                "external_api_call",
                url=url,
                status=response.status_code,
                duration_ms=round(elapsed * 1000, 2),
            )
            return response
    except httpx.TimeoutException:
        elapsed = time.perf_counter() - start
        logger.error(
            "external_api_timeout",
            url=url,
            duration_ms=round(elapsed * 1000, 2),
        )
        raise

Always set explicit timeouts. A hanging external call without a timeout will hold a worker indefinitely.

Step 5: Check Connection Pool Health

Connection pool exhaustion causes requests to queue:

# Monitor pool health
@app.get("/health/detailed")
async def detailed_health():
    return {
        "pool_size": pool.get_size(),
        "pool_idle": pool.get_idle_size(),
        "pool_min": pool.get_min_size(),
        "pool_max": pool.get_max_size(),
    }

If pool_idle is consistently 0, requests are waiting for connections. Increase max_size or optimize query duration.

Step 6: Distributed Tracing

For microservice architectures, use distributed tracing to follow a request across services:

from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor

tracer = trace.get_tracer(__name__)

FastAPIInstrumentor.instrument_app(app)

@app.get("/orders/{order_id}")
async def get_order(order_id: str):
    with tracer.start_as_current_span("fetch_order"):
        order = await fetch_order(order_id)

    with tracer.start_as_current_span("fetch_items"):
        items = await fetch_order_items(order_id)

    with tracer.start_as_current_span("serialize"):
        return serialize_order(order, items)

Each span shows exactly how long that operation took. The trace shows the full request timeline across services.

Optimization Patterns

Parallel Independent Calls

import asyncio

# BAD: Sequential — total time = sum of all calls
user = await fetch_user(user_id)
orders = await fetch_orders(user_id)
prefs = await fetch_preferences(user_id)
# Total: 100ms + 80ms + 50ms = 230ms

# GOOD: Parallel — total time = max of all calls
user, orders, prefs = await asyncio.gather(
    fetch_user(user_id),
    fetch_orders(user_id),
    fetch_preferences(user_id),
)
# Total: max(100ms, 80ms, 50ms) = 100ms

Response Caching

from functools import lru_cache
from datetime import datetime, timedelta

cache: dict[str, tuple[any, datetime]] = {}

async def cached_fetch(key: str, fetcher, ttl_seconds: int = 60):
    if key in cache:
        value, expires = cache[key]
        if datetime.utcnow() < expires:
            return value

    value = await fetcher()
    cache[key] = (value, datetime.utcnow() + timedelta(seconds=ttl_seconds))
    return value

Common Mistakes

Mistake 1: Profiling in development only. Development databases have less data, no contention, and local latency. Always profile in production or staging with realistic data.

Mistake 2: Adding caching before understanding the bottleneck. Caching a fast query saves nothing. Profile first, cache the slow parts.

Mistake 3: No timeouts on external calls. One slow external dependency without a timeout can cascade into a full system outage.

Takeaways

Debugging slow APIs requires systematic measurement. Instrument request timing. Profile database queries. Find N+1 patterns. Time external API calls. Monitor connection pool health. Parallelize independent operations. The fix is usually in the data access layer, not the application logic.