How to Debug Slow APIs in Production Systems
An API that responds in 200ms during testing takes 3 seconds in production. Users complain, alerts fire, and you have no idea where the time goes. The response time is the sum of every operation the endpoint performs — database queries, external API calls, serialization, middleware — and any of them can be the bottleneck.
This guide covers systematic approaches to finding and fixing slow API endpoints. For Python-specific patterns, see the FastAPI performance guide. For database-specific issues, see PostgreSQL query optimization.
Problem
Slow APIs in production share common patterns:
- Fast locally, slow in production (environment differences)
- Slow intermittently (contention, cold starts, garbage collection)
- Slow under load (connection pool exhaustion, thread starvation)
- Gradually slowing over time (data growth, index degradation)
Step 1: Measure Where Time Goes
Add timing instrumentation to the request lifecycle:
import time
from fastapi import Request
import structlog
logger = structlog.get_logger()
@app.middleware("http")
async def timing_middleware(request: Request, call_next):
start = time.perf_counter()
response = await call_next(request)
total = time.perf_counter() - start
logger.info(
"request_completed",
method=request.method,
path=request.url.path,
status=response.status_code,
duration_ms=round(total * 1000, 2),
)
if total > 1.0:
logger.warning(
"slow_request",
method=request.method,
path=request.url.path,
duration_ms=round(total * 1000, 2),
)
return response
Step 2: Profile Database Queries
Database queries are the most common bottleneck. Enable pg_stat_statements:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 10 slowest queries
SELECT query,
round(mean_exec_time::numeric, 2) AS avg_ms,
calls,
round(total_exec_time::numeric, 2) AS total_ms
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
Add per-query timing in the application:
async def timed_query(pool, query: str, *args):
start = time.perf_counter()
result = await pool.fetch(query, *args)
elapsed = time.perf_counter() - start
if elapsed > 0.1:
logger.warning(
"slow_query",
query=query[:200],
duration_ms=round(elapsed * 1000, 2),
row_count=len(result),
)
return result
Step 3: Check N+1 Query Patterns
The most insidious performance killer. One query per item in a list:
# BAD: N+1 — one query per order
orders = await pool.fetch("SELECT * FROM orders LIMIT 100")
for order in orders:
items = await pool.fetch(
"SELECT * FROM order_items WHERE order_id = $1", order["id"]
)
# 1 + 100 = 101 queries
# GOOD: Two queries total
orders = await pool.fetch("SELECT * FROM orders LIMIT 100")
order_ids = [o["id"] for o in orders]
items = await pool.fetch(
"SELECT * FROM order_items WHERE order_id = ANY($1)", order_ids
)
# 2 queries total, regardless of order count
Step 4: Profile External API Calls
External services add unpredictable latency:
import httpx
async def call_external_api(url: str, timeout: float = 5.0):
start = time.perf_counter()
try:
async with httpx.AsyncClient(timeout=timeout) as client:
response = await client.get(url)
elapsed = time.perf_counter() - start
logger.info(
"external_api_call",
url=url,
status=response.status_code,
duration_ms=round(elapsed * 1000, 2),
)
return response
except httpx.TimeoutException:
elapsed = time.perf_counter() - start
logger.error(
"external_api_timeout",
url=url,
duration_ms=round(elapsed * 1000, 2),
)
raise
Always set explicit timeouts. A hanging external call without a timeout will hold a worker indefinitely.
Step 5: Check Connection Pool Health
Connection pool exhaustion causes requests to queue:
# Monitor pool health
@app.get("/health/detailed")
async def detailed_health():
return {
"pool_size": pool.get_size(),
"pool_idle": pool.get_idle_size(),
"pool_min": pool.get_min_size(),
"pool_max": pool.get_max_size(),
}
If pool_idle is consistently 0, requests are waiting for connections. Increase max_size or optimize query duration.
Step 6: Distributed Tracing
For microservice architectures, use distributed tracing to follow a request across services:
from opentelemetry import trace
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
tracer = trace.get_tracer(__name__)
FastAPIInstrumentor.instrument_app(app)
@app.get("/orders/{order_id}")
async def get_order(order_id: str):
with tracer.start_as_current_span("fetch_order"):
order = await fetch_order(order_id)
with tracer.start_as_current_span("fetch_items"):
items = await fetch_order_items(order_id)
with tracer.start_as_current_span("serialize"):
return serialize_order(order, items)
Each span shows exactly how long that operation took. The trace shows the full request timeline across services.
Optimization Patterns
Parallel Independent Calls
import asyncio
# BAD: Sequential — total time = sum of all calls
user = await fetch_user(user_id)
orders = await fetch_orders(user_id)
prefs = await fetch_preferences(user_id)
# Total: 100ms + 80ms + 50ms = 230ms
# GOOD: Parallel — total time = max of all calls
user, orders, prefs = await asyncio.gather(
fetch_user(user_id),
fetch_orders(user_id),
fetch_preferences(user_id),
)
# Total: max(100ms, 80ms, 50ms) = 100ms
Response Caching
from functools import lru_cache
from datetime import datetime, timedelta
cache: dict[str, tuple[any, datetime]] = {}
async def cached_fetch(key: str, fetcher, ttl_seconds: int = 60):
if key in cache:
value, expires = cache[key]
if datetime.utcnow() < expires:
return value
value = await fetcher()
cache[key] = (value, datetime.utcnow() + timedelta(seconds=ttl_seconds))
return value
Common Mistakes
Mistake 1: Profiling in development only. Development databases have less data, no contention, and local latency. Always profile in production or staging with realistic data.
Mistake 2: Adding caching before understanding the bottleneck. Caching a fast query saves nothing. Profile first, cache the slow parts.
Mistake 3: No timeouts on external calls. One slow external dependency without a timeout can cascade into a full system outage.
Takeaways
Debugging slow APIs requires systematic measurement. Instrument request timing. Profile database queries. Find N+1 patterns. Time external API calls. Monitor connection pool health. Parallelize independent operations. The fix is usually in the data access layer, not the application logic.