Error Handling Patterns for Production Python Applications
Your application works in development. In production, it crashes at 3 AM with a stack trace that reveals nothing useful. The error says "Connection refused" but not which connection, to which service, or what the user was trying to do. Good error handling is not about catching exceptions — it is about providing enough context to debug problems without access to the production server.
The Base Exception Pattern
Define application-specific exceptions:
class AppError(Exception):
"""Base exception for the application."""
def __init__(self, message: str, code: str, status_code: int = 500):
self.message = message
self.code = code
self.status_code = status_code
super().__init__(message)
class NotFoundError(AppError):
def __init__(self, resource: str, identifier: str):
super().__init__(
message=f"{resource} with id '{identifier}' not found",
code="NOT_FOUND",
status_code=404,
)
class ValidationError(AppError):
def __init__(self, field: str, reason: str):
super().__init__(
message=f"Validation failed for '{field}': {reason}",
code="VALIDATION_ERROR",
status_code=422,
)
class ConflictError(AppError):
def __init__(self, resource: str, field: str, value: str):
super().__init__(
message=f"{resource} with {field} '{value}' already exists",
code="CONFLICT",
status_code=409,
)
Usage:
def get_user(db: Session, user_id: str) -> User:
user = db.query(User).filter(User.id == user_id).first()
if not user:
raise NotFoundError("User", user_id)
return user
Global Exception Handler in FastAPI
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import logging
import traceback
logger = logging.getLogger(__name__)
app = FastAPI()
@app.exception_handler(AppError)
async def app_error_handler(request: Request, exc: AppError):
return JSONResponse(
status_code=exc.status_code,
content={
"error": {
"code": exc.code,
"message": exc.message,
}
},
)
@app.exception_handler(Exception)
async def unhandled_error_handler(request: Request, exc: Exception):
logger.error(
f"Unhandled error: {exc}",
extra={
"path": request.url.path,
"method": request.method,
"traceback": traceback.format_exc(),
},
)
return JSONResponse(
status_code=500,
content={
"error": {
"code": "INTERNAL_ERROR",
"message": "An unexpected error occurred",
}
},
)
The first handler catches known application errors and returns structured responses. The second catches everything else — logging the full traceback while returning a generic message to the client. Never expose internal error details to users.
Structured Logging
Replace print statements with structured logging:
import structlog
logger = structlog.get_logger()
async def process_payment(user_id: str, amount: float):
logger.info("payment_started", user_id=user_id, amount=amount)
try:
result = await stripe.charge(amount)
logger.info("payment_succeeded",
user_id=user_id,
amount=amount,
charge_id=result.id,
)
return result
except stripe.CardDeclinedError as e:
logger.warning("payment_declined",
user_id=user_id,
amount=amount,
reason=str(e),
)
raise AppError("Payment declined", "PAYMENT_DECLINED", 402)
except Exception as e:
logger.error("payment_failed",
user_id=user_id,
amount=amount,
error=str(e),
traceback=traceback.format_exc(),
)
raise
Structured logs use key-value pairs instead of formatted strings. They are searchable, filterable, and machine-parseable.
Context Propagation
Add request context to every log line:
import uuid
from starlette.middleware.base import BaseHTTPMiddleware
class RequestContextMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
request_id = str(uuid.uuid4())[:8]
structlog.contextvars.clear_contextvars()
structlog.contextvars.bind_contextvars(
request_id=request_id,
path=request.url.path,
method=request.method,
)
response = await call_next(request)
response.headers["X-Request-ID"] = request_id
return response
Now every log line in the request includes the request ID, path, and method — without passing them explicitly.
Retry with Backoff
For transient failures (network timeouts, rate limits):
import asyncio
from functools import wraps
def retry(max_attempts=3, backoff_factor=1.0, exceptions=(Exception,)):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
for attempt in range(max_attempts):
try:
return await func(*args, **kwargs)
except exceptions as e:
if attempt == max_attempts - 1:
raise
wait_time = backoff_factor * (2 ** attempt)
logger.warning("retrying",
function=func.__name__,
attempt=attempt + 1,
wait_seconds=wait_time,
error=str(e),
)
await asyncio.sleep(wait_time)
return wrapper
return decorator
@retry(max_attempts=3, backoff_factor=0.5, exceptions=(ConnectionError, TimeoutError))
async def call_external_api(url: str):
async with httpx.AsyncClient(timeout=10) as client:
response = await client.get(url)
response.raise_for_status()
return response.json()
Exponential backoff: 0.5s, 1s, 2s. Prevents overwhelming a struggling service.
Takeaways
Error handling is infrastructure, not an afterthought. Define application-specific exceptions. Use a global handler to convert them to consistent API responses. Log structured context with every error. Retry transient failures with backoff. The goal is not to prevent errors — it is to know exactly what went wrong and where when they inevitably happen.