Docker Best Practices for Production Applications
Docker makes packaging applications straightforward. Creating production-ready containers is another challenge entirely. The difference between a container that "works on my machine" and one that runs reliably in production comes down to understanding image optimization, security hardening, and operational best practices.
This guide covers the patterns I've learned from running containers in production—the multi-stage builds, security considerations, and configuration strategies that make containers production-grade.
Problem
Default Docker practices create images that are:
- Large — Development dependencies and build artifacts bloat images to gigabytes
- Insecure — Running as root, outdated base images, exposed secrets
- Slow — Poor layer caching means rebuilds take minutes instead of seconds
- Fragile — Missing health checks, poor signal handling, no graceful shutdown
These issues cause cascading problems: slow deployments, security vulnerabilities, debugging nightmares, and unreliable services.
Why This Matters
Production containers need to be:
- Small — Faster pulls, smaller attack surface, cheaper storage
- Secure — Non-root users, minimal packages, scanned for vulnerabilities
- Observable — Health checks, structured logs, metrics
- Reliable — Graceful shutdown, proper signal handling, deterministic builds
NOTE: Docker best practices evolve. What was standard in 2020 may be outdated now. Keep learning, but understand the principles behind the practices.
Solution
Multi-Stage Builds
The most impactful optimization. Separate build dependencies from runtime.
Node.js Example
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
# Install dependencies first (better caching)
COPY package*.json ./
RUN npm ci
# Build application
COPY . .
RUN npm run build
# Prune dev dependencies
RUN npm prune --production
# Stage 2: Production
FROM node:20-alpine AS production
# Security: create non-root user
RUN addgroup --system --gid 1001 nodejs \
&& adduser --system --uid 1001 appuser
WORKDIR /app
# Copy only production artifacts
COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --from=builder --chown=appuser:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:nodejs /app/package.json ./
USER appuser
EXPOSE 3000
CMD ["node", "dist/main.js"]
Python Example
# Stage 1: Build
FROM python:3.12-slim AS builder
WORKDIR /app
# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /app/wheels -r requirements.txt
# Stage 2: Production
FROM python:3.12-slim AS production
# Security: create non-root user
RUN useradd --create-home --shell /bin/bash appuser
WORKDIR /app
# Install wheels from builder
COPY --from=builder /app/wheels /wheels
RUN pip install --no-cache-dir /wheels/* \
&& rm -rf /wheels
# Copy application
COPY --chown=appuser:appuser . .
USER appuser
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
TIP: Image size comparison — Python with dev tools: ~1.2GB. Production multi-stage: ~180MB. The difference matters for pull times and security scanning.
Implementation: Security Hardening
1. Never Run as Root
# Create user in Dockerfile
RUN addgroup --system appgroup \
&& adduser --system --ingroup appgroup appuser
# Change ownership of app files
COPY --chown=appuser:appgroup . .
# Switch to non-root user
USER appuser
2. Use Specific Base Image Tags
# Wrong - unpredictable, changes over time
FROM python:latest
# Better - specific version
FROM python:3.12
# Best - specific version and variant
FROM python:3.12.1-slim-bookworm
WARNING: Using
latesttags in production means your builds are non-reproducible. A rebuild tomorrow might pull different code than today.
3. Scan for Vulnerabilities
# Using Trivy
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy:latest image myapp:latest
# Using Docker Scout (Docker Desktop)
docker scout cves myapp:latest
# Using Grype
grype myapp:latest
Add scanning to CI/CD and fail builds on HIGH/CRITICAL vulnerabilities.
4. Use .dockerignore
# Version control
.git
.gitignore
# Dependencies (will be installed fresh)
node_modules
__pycache__
*.pyc
venv/
.venv/
# Build artifacts
dist/
build/
*.egg-info/
# Development files
.env
.env.*
*.md
Dockerfile*
docker-compose*
# IDE and editor files
.vscode/
.idea/
*.swp
# Testing
.pytest_cache/
.coverage
coverage/
tests/
# Logs
*.log
logs/
Layer Optimization
Understanding Docker's layer caching is key to fast builds.
# Layers that change rarely → first
FROM python:3.12-slim
WORKDIR /app
# System dependencies change occasionally
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
# Python dependencies change sometimes
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Application code changes frequently → last
COPY . .
Each instruction creates a layer. If a layer changes, all subsequent layers rebuild. Order from least to most frequently changing.
Health Checks
Essential for orchestration (Kubernetes, Docker Swarm, ECS):
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health/live || exit 1
Better: Use a lightweight health check binary:
# Download health check utility
COPY --from=minio/mc /usr/bin/mc /usr/bin/mc
# Or use wget (smaller)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:8000/health/live || exit 1
Example: Graceful Shutdown
Handle SIGTERM properly for zero-downtime deployments:
import signal
import asyncio
from contextlib import asynccontextmanager
from fastapi import FastAPI
shutdown_event = asyncio.Event()
def handle_sigterm(signum, frame):
"""Handle SIGTERM for graceful shutdown."""
shutdown_event.set()
signal.signal(signal.SIGTERM, handle_sigterm)
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup
yield
# Shutdown - wait for in-flight requests
print("Shutting down gracefully...")
app = FastAPI(lifespan=lifespan)
In Dockerfile:
# Use exec form so signals are forwarded to the process
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
# NOT shell form (signals go to shell, not app)
# CMD uvicorn app.main:app --host 0.0.0.0 --port 8000
Docker Compose for Development
Mirror production structure with development conveniences:
services:
app:
build:
context: .
target: builder # Use builder stage for dev
volumes:
- .:/app # Hot reload
- /app/node_modules # Don't override deps
ports:
- "3000:3000"
environment:
- NODE_ENV=development
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: devuser
POSTGRES_PASSWORD: devpass
POSTGRES_DB: myapp_dev
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U devuser -d myapp_dev"]
interval: 5s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 5s
retries: 5
volumes:
postgres_data:
Environment Configuration
Document and validate required environment variables:
# Set defaults for optional vars
ENV NODE_ENV=production \
PORT=3000 \
LOG_LEVEL=info
# Document required vars (will fail at runtime if missing)
# DATABASE_URL - PostgreSQL connection string
# JWT_SECRET - Secret for signing JWTs
# Required: DATABASE_URL, JWT_SECRET
In application:
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
# Required - no default, app won't start without them
database_url: str
jwt_secret: str
# Optional with defaults
port: int = 8000
log_level: str = "info"
environment: str = "production"
# App fails fast on startup if required vars are missing
settings = Settings()
Common Mistakes
1. Installing Dev Dependencies in Production
# Wrong
RUN pip install -r requirements.txt # Includes pytest, black, etc.
# Correct - separate requirements
COPY requirements.txt requirements-dev.txt ./
RUN pip install --no-cache-dir -r requirements.txt
# Only install dev deps in dev stage
2. Not Pinning Package Versions
# Wrong - installs whatever is latest
RUN apt-get install -y curl
# Correct - pinned versions
RUN apt-get install -y curl=7.88.1-10+deb12u5
3. Leaving Build Artifacts
# Wrong - apt cache left behind
RUN apt-get update \
&& apt-get install -y build-essential
# Correct - clean up in same layer
RUN apt-get update \
&& apt-get install -y --no-install-recommends build-essential \
&& rm -rf /var/lib/apt/lists/*
4. Running Database Migrations in Container Startup
# Wrong - migrations in CMD
CMD ["sh", "-c", "python manage.py migrate && uvicorn app:app"]
# Correct - migrations as separate step in deployment
# Run: docker run --rm myapp python manage.py migrate
# Then: docker run myapp
Conclusion
Production-ready containers require attention to image size, security, caching, and operational concerns. Multi-stage builds are non-negotiable for any serious project. Running as non-root is a basic security requirement. Health checks enable reliable orchestration.
These practices take extra time upfront but prevent the 3 AM incidents that come from brittle containers. Your deployment pipeline and on-call rotation will thank you.