Containerising Your Agent with Docker
Docker packages your agent, its dependencies, its runtime, and its configuration into a single portable image. That image runs identically on your laptop, in CI, and in production — eliminating the "works on my machine" class of bugs that are especially painful for AI agents, where environment drift can produce subtle, hard-to-reproduce failures. This lesson builds a production-grade Docker setup from scratch.
Why Containerisation Matters for AI Agents
AI agents have specific containerisation needs that a basic Python web service does not:
- Dependency pinning is critical — a minor version bump in
langchainoropenaican silently change agent behaviour - API keys must be secrets, not environment variables baked into the image
- Startup time matters — loading model weights or connecting to vector databases on cold start can take 5–30 seconds; health checks must account for this
- Multi-stage builds keep images lean — development tools and build artifacts should not ship to production
Project Structure for a Containerised Agent
my_agent/
├── Dockerfile
├── docker-compose.yml
├── docker-compose.override.yml # Local dev overrides (gitignored)
├── .dockerignore
├── pyproject.toml # Or requirements.txt
├── agent/
│ ├── __init__.py
│ ├── main.py # FastAPI or CLI entrypoint
│ ├── orchestrator.py
│ └── tools/
└── tests/
The Dockerfile
A production Dockerfile for a Python AI agent should use a multi-stage build: a builder stage installs dependencies, and a lean runtime stage copies only the artefacts needed to run.
# syntax=docker/dockerfile:1.6
# FILE: Dockerfile
# PURPOSE: Multi-stage build for a Python AI agent service.
# Builder stage installs all deps; runtime stage ships only
# what is needed to execute the agent.
# ─────────────────────────────────────────────────
# Stage 1: builder — install Python dependencies
# ─────────────────────────────────────────────────
FROM python:3.12-slim AS builder
# Set a consistent working directory for the build stage
WORKDIR /build
# Install system dependencies required for common Python packages
# (e.g. psycopg2 needs libpq-dev, cryptography needs libssl-dev)
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
libpq-dev \
libssl-dev \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy only the dependency manifest first — this layer is cached as long
# as pyproject.toml doesn't change, even if application code does.
COPY pyproject.toml .
# Install dependencies into a known prefix we can copy to the runtime stage.
# --no-cache-dir keeps the image smaller.
RUN pip install --no-cache-dir --prefix=/install .
# ─────────────────────────────────────────────────
# Stage 2: runtime — minimal image for production
# ─────────────────────────────────────────────────
FROM python:3.12-slim AS runtime
# Security: run as a non-root user
RUN groupadd --gid 1001 agentuser \
&& useradd --uid 1001 --gid agentuser --shell /bin/bash --create-home agentuser
WORKDIR /app
# Copy installed packages from the builder stage only
COPY --from=builder /install /usr/local
# Copy application source code
COPY --chown=agentuser:agentuser agent/ ./agent/
# Switch to non-root user before running the process
USER agentuser
# Expose the port the agent's HTTP server listens on
EXPOSE 8000
# Health check — polls the /health endpoint every 30 seconds.
# The agent gets 60 seconds to start (--start-period) before checks begin.
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
# The default command — override in docker-compose for worker or CLI mode
CMD ["python", "-m", "uvicorn", "agent.main:app", "--host", "0.0.0.0", "--port", "8000"]
The .dockerignore File
A good .dockerignore file prevents bloating your image with files that do not belong in production:
# .dockerignore
.git
.github
.venv
__pycache__
*.pyc
*.pyo
.pytest_cache
.mypy_cache
.ruff_cache
tests/
docs/
*.md
.env
.env.*
docker-compose.override.yml
*.log
dist/
build/
Important: Always include
.envin.dockerignore. Environment files containing API keys must never be copied into a Docker image.
Managing Secrets
API keys for OpenAI, Anthropic, Pinecone, etc. are secrets. There are three common patterns:
Pattern 1: Runtime Environment Variables (Development)
# docker-compose.yml — development only
services:
agent:
build: .
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY} # Read from host shell's .env
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
# On the host
export OPENAI_API_KEY=sk-...
docker compose up
Pattern 2: Docker Secrets (Production Compose)
# docker-compose.yml — production
services:
agent:
build: .
secrets:
- openai_api_key
- anthropic_api_key
environment:
- OPENAI_API_KEY_FILE=/run/secrets/openai_api_key
secrets:
openai_api_key:
external: true # Created with: docker secret create openai_api_key -
anthropic_api_key:
external: true
# agent/config.py — reading Docker secrets at runtime
from pathlib import Path
def read_secret(name: str, env_var: str | None = None) -> str:
"""
Read a secret from a Docker secrets file or fall back to an environment variable.
Docker mounts secrets at /run/secrets/<name> at container startup.
This approach works in both Docker Swarm and Kubernetes environments.
"""
import os
secret_path = Path(f"/run/secrets/{name}")
if secret_path.exists():
return secret_path.read_text().strip()
# Fall back to environment variable for local development
if env_var and (value := os.getenv(env_var)):
return value
raise RuntimeError(
f"Secret '{name}' not found. "
f"Expected at {secret_path} or in env var '{env_var}'."
)
# Usage
OPENAI_API_KEY = read_secret("openai_api_key", env_var="OPENAI_API_KEY")
Pattern 3: External Secret Manager (Recommended for Production)
For Kubernetes or cloud deployments, fetch secrets at runtime from AWS Secrets Manager, HashiCorp Vault, or Google Secret Manager. Never store secrets in the image or in source control.
Docker Compose for Local Development
The full docker-compose.yml brings up the agent alongside its dependencies — a Redis instance for session storage and a PostgreSQL database for long-term memory:
# docker-compose.yml
version: "3.9"
services:
agent:
build:
context: .
target: runtime
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- REDIS_URL=redis://redis:6379/0
- DATABASE_URL=postgresql://agent:password@postgres:5432/agent_db
- LOG_LEVEL=INFO
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
restart: unless-stopped
redis:
image: redis:7-alpine
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: agent
POSTGRES_PASSWORD: password
POSTGRES_DB: agent_db
volumes:
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U agent -d agent_db"]
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres_data:
The development override file lets you mount source code for hot-reload without changing the base compose file:
# docker-compose.override.yml (gitignored — for local dev only)
version: "3.9"
services:
agent:
build:
target: builder # Use the builder stage which has dev tools installed
volumes:
- ./agent:/app/agent # Mount source for hot-reload
environment:
- LOG_LEVEL=DEBUG
command: ["python", "-m", "uvicorn", "agent.main:app",
"--host", "0.0.0.0", "--port", "8000", "--reload"]
Health Checks for Agent Services
Agent startup is not instantaneous. Loading the LLM client, connecting to Redis, and warming up vector index lookups can take 10–60 seconds. Your health check must reflect this:
# agent/main.py — health check endpoint
from fastapi import FastAPI, Response
import time
import logging
logger = logging.getLogger(__name__)
app = FastAPI()
# Track when the service finished initialising
_start_time: float = 0.0
_ready: bool = False
@app.on_event("startup")
async def startup():
"""Run initialisation tasks and mark the service as ready."""
global _start_time, _ready
_start_time = time.monotonic()
logger.info("[INFO][startup] Initialising agent service...")
# Initialise dependencies (LLM client, Redis, vector index)
await _init_dependencies()
_ready = True
elapsed = time.monotonic() - _start_time
logger.info("[INFO][startup] Agent service ready in %.2fs", elapsed)
@app.get("/health")
async def health(response: Response):
"""
Liveness and readiness health check.
Returns 200 when the service is ready to handle requests.
Returns 503 during startup or if a critical dependency is unavailable.
Docker, Kubernetes, and load balancers poll this endpoint.
"""
if not _ready:
response.status_code = 503
return {"status": "starting", "uptime_seconds": time.monotonic() - _start_time}
# Quick dependency checks
checks = {}
try:
await redis_client.ping()
checks["redis"] = "ok"
except Exception as exc:
checks["redis"] = f"error: {exc}"
all_healthy = all(v == "ok" for v in checks.values())
if not all_healthy:
response.status_code = 503
return {
"status": "healthy" if all_healthy else "degraded",
"checks": checks,
"uptime_seconds": round(time.monotonic() - _start_time, 1),
}
async def _init_dependencies():
"""Initialise all agent dependencies during startup."""
# Placeholder — connect to Redis, load vector index, warm LLM client
import asyncio
await asyncio.sleep(0) # Replace with real init calls
Building and Running
# Build the production image
docker build --target runtime -t my-agent:latest .
# Run with environment variables from a .env file
docker compose up --build
# Check service health
curl http://localhost:8000/health
# View agent logs
docker compose logs -f agent
# Run tests inside the container
docker compose run --rm agent pytest tests/ -m "not slow"
# Push to a registry
docker tag my-agent:latest registry.example.com/my-agent:v1.0.0
docker push registry.example.com/my-agent:v1.0.0
Best Practices Summary
| Practice | Why |
|---|---|
| Multi-stage build | Keeps production image small; dev tools stay in builder stage |
| Non-root user | Reduces attack surface; required by some Kubernetes admission controllers |
.dockerignore includes .env | Prevents secrets from being copied into the image layer |
--start-period on HEALTHCHECK | Gives the agent time to initialise before Docker marks it unhealthy |
| Pin base image with digest | python:3.12-slim@sha256:abc... prevents surprise upstream changes |
COPY pyproject.toml before source code | Maximises layer caching — dependency layer only rebuilds when deps change |
Key Takeaways
- Use multi-stage builds to keep your production image lean — the builder stage installs deps, the runtime stage ships the minimal artefact.
- Never bake secrets into your image. Use runtime environment variables for development, Docker secrets or an external secret manager for production.
- Implement a
/healthendpoint that reflects true service readiness, not just process liveness. - Use
docker-compose.override.ymlfor local dev customisations — keep the basedocker-compose.ymlproduction-safe and commit it to source control. - Always add a
.dockerignorethat excludes.envfiles,__pycache__, test files, and other non-runtime artefacts.
Further Reading
- Docker multi-stage builds documentation
- Docker secrets documentation
- Kubernetes health probes — extends these concepts to Kubernetes