Docker Best Practices for Production Applications

· 9 min read · DevOps

Learn essential Docker practices for building secure, efficient, and production-ready container images.

Docker Best Practices for Production Applications

Docker makes packaging applications straightforward. Creating production-ready containers is another challenge entirely. The difference between a container that "works on my machine" and one that runs reliably in production comes down to understanding image optimization, security hardening, and operational best practices.

This guide covers the patterns I've learned from running containers in production—the multi-stage builds, security considerations, and configuration strategies that make containers production-grade.

Problem

Default Docker practices create images that are:

  • Large — Development dependencies and build artifacts bloat images to gigabytes
  • Insecure — Running as root, outdated base images, exposed secrets
  • Slow — Poor layer caching means rebuilds take minutes instead of seconds
  • Fragile — Missing health checks, poor signal handling, no graceful shutdown

These issues cause cascading problems: slow deployments, security vulnerabilities, debugging nightmares, and unreliable services.

Why This Matters

Production containers need to be:

  1. Small — Faster pulls, smaller attack surface, cheaper storage
  2. Secure — Non-root users, minimal packages, scanned for vulnerabilities
  3. Observable — Health checks, structured logs, metrics
  4. Reliable — Graceful shutdown, proper signal handling, deterministic builds

NOTE: Docker best practices evolve. What was standard in 2020 may be outdated now. Keep learning, but understand the principles behind the practices.

Solution

Multi-Stage Builds

The most impactful optimization. Separate build dependencies from runtime.

Node.js Example

# Stage 1: Build
FROM node:20-alpine AS builder

WORKDIR /app

# Install dependencies first (better caching)
COPY package*.json ./
RUN npm ci

# Build application
COPY . .
RUN npm run build

# Prune dev dependencies
RUN npm prune --production

# Stage 2: Production
FROM node:20-alpine AS production

# Security: create non-root user
RUN addgroup --system --gid 1001 nodejs \
    && adduser --system --uid 1001 appuser

WORKDIR /app

# Copy only production artifacts
COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --from=builder --chown=appuser:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:nodejs /app/package.json ./

USER appuser

EXPOSE 3000

CMD ["node", "dist/main.js"]

Python Example

# Stage 1: Build
FROM python:3.12-slim AS builder

WORKDIR /app

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt

# Stage 2: Production
FROM python:3.12-slim AS production

# Security: create non-root user
RUN useradd --create-home --shell /bin/bash appuser

WORKDIR /app

# Install wheels from builder
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache-dir /wheels/* \
    && rm -rf /wheels

# Copy application
COPY --chown=appuser:appuser . .

USER appuser

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

TIP: Image size comparison — Python with dev tools: ~1.2GB. Production multi-stage: ~180MB. The difference matters for pull times and security scanning.

Implementation: Security Hardening

1. Never Run as Root

# Create user in Dockerfile
RUN addgroup --system appgroup \
    && adduser --system --ingroup appgroup appuser

# Change ownership of app files
COPY --chown=appuser:appgroup . .

# Switch to non-root user
USER appuser

2. Use Specific Base Image Tags

# Wrong - unpredictable, changes over time
FROM python:latest

# Better - specific version
FROM python:3.12

# Best - specific version and variant
FROM python:3.12.1-slim-bookworm

WARNING: Using latest tags in production means your builds are non-reproducible. A rebuild tomorrow might pull different code than today.

3. Scan for Vulnerabilities

# Using Trivy
docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy:latest image myapp:latest

# Using Docker Scout (Docker Desktop)
docker scout cves myapp:latest

# Using Grype
grype myapp:latest

Add scanning to CI/CD and fail builds on HIGH/CRITICAL vulnerabilities.

4. Use .dockerignore

# Version control
.git
.gitignore

# Dependencies (will be installed fresh)
node_modules
__pycache__
*.pyc
venv/
.venv/

# Build artifacts
dist/
build/
*.egg-info/

# Development files
.env
.env.*
*.md
Dockerfile*
docker-compose*

# IDE and editor files
.vscode/
.idea/
*.swp

# Testing
.pytest_cache/
.coverage
coverage/
tests/

# Logs
*.log
logs/

Layer Optimization

Understanding Docker's layer caching is key to fast builds.

# Layers that change rarely → first
FROM python:3.12-slim

WORKDIR /app

# System dependencies change occasionally
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Python dependencies change sometimes
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Application code changes frequently → last
COPY . .

Each instruction creates a layer. If a layer changes, all subsequent layers rebuild. Order from least to most frequently changing.

Health Checks

Essential for orchestration (Kubernetes, Docker Swarm, ECS):

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health/live || exit 1

Better: Use a lightweight health check binary:

# Download health check utility
COPY --from=minio/mc /usr/bin/mc /usr/bin/mc

# Or use wget (smaller)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:8000/health/live || exit 1

Example: Graceful Shutdown

Handle SIGTERM properly for zero-downtime deployments:

import signal
import asyncio
from contextlib import asynccontextmanager
from fastapi import FastAPI

shutdown_event = asyncio.Event()

def handle_sigterm(signum, frame):
    """Handle SIGTERM for graceful shutdown."""
    shutdown_event.set()

signal.signal(signal.SIGTERM, handle_sigterm)

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup
    yield
    # Shutdown - wait for in-flight requests
    print("Shutting down gracefully...")

app = FastAPI(lifespan=lifespan)

In Dockerfile:

# Use exec form so signals are forwarded to the process
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

# NOT shell form (signals go to shell, not app)
# CMD uvicorn app.main:app --host 0.0.0.0 --port 8000

Docker Compose for Development

Mirror production structure with development conveniences:

services:
  app:
    build:
      context: .
      target: builder  # Use builder stage for dev
    volumes:
      - .:/app          # Hot reload
      - /app/node_modules  # Don't override deps
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=development
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: devuser
      POSTGRES_PASSWORD: devpass
      POSTGRES_DB: myapp_dev
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U devuser -d myapp_dev"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 5s
      retries: 5

volumes:
  postgres_data:

Environment Configuration

Document and validate required environment variables:

# Set defaults for optional vars
ENV NODE_ENV=production \
    PORT=3000 \
    LOG_LEVEL=info

# Document required vars (will fail at runtime if missing)
# DATABASE_URL - PostgreSQL connection string
# JWT_SECRET - Secret for signing JWTs
# Required: DATABASE_URL, JWT_SECRET

In application:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    # Required - no default, app won't start without them
    database_url: str
    jwt_secret: str
    
    # Optional with defaults
    port: int = 8000
    log_level: str = "info"
    environment: str = "production"

# App fails fast on startup if required vars are missing
settings = Settings()

Common Mistakes

1. Installing Dev Dependencies in Production

# Wrong
RUN pip install -r requirements.txt  # Includes pytest, black, etc.

# Correct - separate requirements
COPY requirements.txt requirements-dev.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Only install dev deps in dev stage

2. Not Pinning Package Versions

# Wrong - installs whatever is latest
RUN apt-get install -y curl

# Correct - pinned versions
RUN apt-get install -y curl=7.88.1-10+deb12u5

3. Leaving Build Artifacts

# Wrong - apt cache left behind
RUN apt-get update \
    && apt-get install -y build-essential

# Correct - clean up in same layer
RUN apt-get update \
    && apt-get install -y --no-install-recommends build-essential \
    && rm -rf /var/lib/apt/lists/*

4. Running Database Migrations in Container Startup

# Wrong - migrations in CMD
CMD ["sh", "-c", "python manage.py migrate && uvicorn app:app"]

# Correct - migrations as separate step in deployment
# Run: docker run --rm myapp python manage.py migrate
# Then: docker run myapp

Conclusion

Production-ready containers require attention to image size, security, caching, and operational concerns. Multi-stage builds are non-negotiable for any serious project. Running as non-root is a basic security requirement. Health checks enable reliable orchestration.

These practices take extra time upfront but prevent the 3 AM incidents that come from brittle containers. Your deployment pipeline and on-call rotation will thank you.