Error Handling Patterns for Production Python Applications

· 11 min read · Backend Development

Production error handling patterns for Python applications including custom exceptions, structured logging, context propagation, and retry logic.

Error Handling Patterns for Production Python Applications

Your application works in development. In production, it crashes at 3 AM with a stack trace that reveals nothing useful. The error says "Connection refused" but not which connection, to which service, or what the user was trying to do. Good error handling is not about catching exceptions — it is about providing enough context to debug problems without access to the production server.

The Base Exception Pattern

Define application-specific exceptions:

class AppError(Exception):
    """Base exception for the application."""
    def __init__(self, message: str, code: str, status_code: int = 500):
        self.message = message
        self.code = code
        self.status_code = status_code
        super().__init__(message)

class NotFoundError(AppError):
    def __init__(self, resource: str, identifier: str):
        super().__init__(
            message=f"{resource} with id '{identifier}' not found",
            code="NOT_FOUND",
            status_code=404,
        )

class ValidationError(AppError):
    def __init__(self, field: str, reason: str):
        super().__init__(
            message=f"Validation failed for '{field}': {reason}",
            code="VALIDATION_ERROR",
            status_code=422,
        )

class ConflictError(AppError):
    def __init__(self, resource: str, field: str, value: str):
        super().__init__(
            message=f"{resource} with {field} '{value}' already exists",
            code="CONFLICT",
            status_code=409,
        )

Usage:

def get_user(db: Session, user_id: str) -> User:
    user = db.query(User).filter(User.id == user_id).first()
    if not user:
        raise NotFoundError("User", user_id)
    return user

Global Exception Handler in FastAPI

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import logging
import traceback

logger = logging.getLogger(__name__)

app = FastAPI()

@app.exception_handler(AppError)
async def app_error_handler(request: Request, exc: AppError):
    return JSONResponse(
        status_code=exc.status_code,
        content={
            "error": {
                "code": exc.code,
                "message": exc.message,
            }
        },
    )

@app.exception_handler(Exception)
async def unhandled_error_handler(request: Request, exc: Exception):
    logger.error(
        f"Unhandled error: {exc}",
        extra={
            "path": request.url.path,
            "method": request.method,
            "traceback": traceback.format_exc(),
        },
    )
    return JSONResponse(
        status_code=500,
        content={
            "error": {
                "code": "INTERNAL_ERROR",
                "message": "An unexpected error occurred",
            }
        },
    )

The first handler catches known application errors and returns structured responses. The second catches everything else — logging the full traceback while returning a generic message to the client. Never expose internal error details to users.

Structured Logging

Replace print statements with structured logging:

import structlog

logger = structlog.get_logger()

async def process_payment(user_id: str, amount: float):
    logger.info("payment_started", user_id=user_id, amount=amount)

    try:
        result = await stripe.charge(amount)
        logger.info("payment_succeeded",
            user_id=user_id,
            amount=amount,
            charge_id=result.id,
        )
        return result
    except stripe.CardDeclinedError as e:
        logger.warning("payment_declined",
            user_id=user_id,
            amount=amount,
            reason=str(e),
        )
        raise AppError("Payment declined", "PAYMENT_DECLINED", 402)
    except Exception as e:
        logger.error("payment_failed",
            user_id=user_id,
            amount=amount,
            error=str(e),
            traceback=traceback.format_exc(),
        )
        raise

Structured logs use key-value pairs instead of formatted strings. They are searchable, filterable, and machine-parseable.

Context Propagation

Add request context to every log line:

import uuid
from starlette.middleware.base import BaseHTTPMiddleware

class RequestContextMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        request_id = str(uuid.uuid4())[:8]
        structlog.contextvars.clear_contextvars()
        structlog.contextvars.bind_contextvars(
            request_id=request_id,
            path=request.url.path,
            method=request.method,
        )
        response = await call_next(request)
        response.headers["X-Request-ID"] = request_id
        return response

Now every log line in the request includes the request ID, path, and method — without passing them explicitly.

Retry with Backoff

For transient failures (network timeouts, rate limits):

import asyncio
from functools import wraps

def retry(max_attempts=3, backoff_factor=1.0, exceptions=(Exception,)):
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return await func(*args, **kwargs)
                except exceptions as e:
                    if attempt == max_attempts - 1:
                        raise
                    wait_time = backoff_factor * (2 ** attempt)
                    logger.warning("retrying",
                        function=func.__name__,
                        attempt=attempt + 1,
                        wait_seconds=wait_time,
                        error=str(e),
                    )
                    await asyncio.sleep(wait_time)
        return wrapper
    return decorator

@retry(max_attempts=3, backoff_factor=0.5, exceptions=(ConnectionError, TimeoutError))
async def call_external_api(url: str):
    async with httpx.AsyncClient(timeout=10) as client:
        response = await client.get(url)
        response.raise_for_status()
        return response.json()

Exponential backoff: 0.5s, 1s, 2s. Prevents overwhelming a struggling service.

Takeaways

Error handling is infrastructure, not an afterthought. Define application-specific exceptions. Use a global handler to convert them to consistent API responses. Log structured context with every error. Retry transient failures with backoff. The goal is not to prevent errors — it is to know exactly what went wrong and where when they inevitably happen.