Advanced

Step 5: Deploy & Scale

Your image generator works locally. Now you need to put it into production with Docker, protect it with rate limiting, control API costs, and serve images efficiently through a CDN.

Dockerize the Application

Create a Dockerfile in the project root:

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install system dependencies for Pillow
RUN apt-get update && apt-get install -y \
    libjpeg-dev \
    libpng-dev \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Create images directory
RUN mkdir -p static/images

# Expose port
EXPOSE 8000

# Run with uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Create a docker-compose.yml for the full stack with Nginx as a reverse proxy:

# docker-compose.yml
version: "3.8"

services:
  app:
    build: .
    container_name: ai-image-gen
    env_file: .env
    volumes:
      - image_data:/app/static/images
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  nginx:
    image: nginx:alpine
    container_name: ai-image-gen-nginx
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf
      - image_data:/var/www/images:ro
      - ./certbot/conf:/etc/letsencrypt:ro
    depends_on:
      - app
    restart: unless-stopped

volumes:
  image_data:

Nginx Configuration

Create nginx.conf to serve static images directly (bypassing FastAPI) and proxy API requests:

# nginx.conf
upstream app {
    server app:8000;
}

server {
    listen 80;
    server_name your-domain.com;

    # Serve generated images directly from nginx (much faster)
    location /static/images/ {
        alias /var/www/images/;
        expires 30d;
        add_header Cache-Control "public, immutable";
        add_header X-Content-Type-Options "nosniff";
    }

    # Proxy API requests to FastAPI
    location /api/ {
        proxy_pass http://app;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout 120s;  # Image generation can take time
    }

    # Proxy all other requests to FastAPI
    location / {
        proxy_pass http://app;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    # Limit upload size for img2img and inpainting
    client_max_body_size 10M;
}

Build and Run

# Build and start everything
docker-compose up -d --build

# Check logs
docker-compose logs -f app

# Check health
curl http://localhost/health

Rate Limiting

Without rate limiting, a single user could drain your API budget in minutes. Add rate limiting with the slowapi library:

# Add to requirements.txt:
# slowapi==0.1.9

# rate_limiter.py
from slowapi import Limiter
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded
from fastapi import Request
from fastapi.responses import JSONResponse

limiter = Limiter(key_func=get_remote_address)


async def rate_limit_handler(request: Request, exc: RateLimitExceeded):
    return JSONResponse(
        status_code=429,
        content={
            "detail": "Too many requests. Please wait before generating more images.",
            "retry_after": str(exc.detail),
        },
    )

Apply rate limits to the generation endpoints in main.py:

# main.py (updated)
from rate_limiter import limiter, rate_limit_handler
from slowapi.errors import RateLimitExceeded
from slowapi import _rate_limit_default_key_func

app = FastAPI(title="AI Image Generator")
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, rate_limit_handler)

And in your router:

from rate_limiter import limiter
from fastapi import Request

@router.post("/generate", response_model=GenerateResponse)
@limiter.limit("10/minute")  # Max 10 images per minute per IP
async def generate_image(request: Request, body: GenerateRequest):
    # ... existing code ...

@router.post("/batch-generate")
@limiter.limit("2/minute")  # Batches are expensive, limit more
async def batch_generate(request: Request, ...):
    # ... existing code ...

Cost Management

API costs can spiral quickly. Implement a cost tracking system:

# services/cost_tracker.py
import json
from pathlib import Path
from datetime import datetime, date

COST_FILE = Path("data/costs.json")
COST_FILE.parent.mkdir(parents=True, exist_ok=True)

# Cost per image by provider (approximate, in USD)
COSTS = {
    "stability": 0.004,        # ~$0.004 per image
    "stability-img2img": 0.005,
    "stability-inpaint": 0.005,
    "stability-upscale": 0.003,
    "replicate": 0.005,        # ~$0.005 per image
    "replicate-upscale": 0.003,
}

# Daily budget limit in USD
DAILY_BUDGET = 5.00


class CostTracker:
    def __init__(self):
        self.costs = self._load()

    def _load(self) -> dict:
        if COST_FILE.exists():
            return json.loads(COST_FILE.read_text())
        return {"daily": {}, "total": 0.0}

    def _save(self):
        COST_FILE.write_text(json.dumps(self.costs, indent=2))

    def record(self, provider: str) -> float:
        """Record a generation cost. Returns the cost."""
        cost = COSTS.get(provider, 0.005)
        today = date.today().isoformat()

        if today not in self.costs["daily"]:
            self.costs["daily"][today] = 0.0

        self.costs["daily"][today] += cost
        self.costs["total"] += cost
        self._save()
        return cost

    def get_daily_spend(self) -> float:
        """Get total spend for today."""
        today = date.today().isoformat()
        return self.costs["daily"].get(today, 0.0)

    def check_budget(self) -> bool:
        """Return True if still within daily budget."""
        return self.get_daily_spend() < DAILY_BUDGET

    def get_stats(self) -> dict:
        """Return cost statistics."""
        today = date.today().isoformat()
        return {
            "today": round(self.costs["daily"].get(today, 0.0), 4),
            "total": round(self.costs["total"], 4),
            "daily_budget": DAILY_BUDGET,
            "budget_remaining": round(
                DAILY_BUDGET - self.get_daily_spend(), 4
            ),
        }

Add a budget check before each generation:

# In routers/generate.py
from services.cost_tracker import CostTracker

cost_tracker = CostTracker()

@router.post("/generate")
async def generate_image(request: Request, body: GenerateRequest):
    # Check budget before generating
    if not cost_tracker.check_budget():
        raise HTTPException(
            status_code=429,
            detail="Daily generation budget exceeded. Try again tomorrow."
        )

    # ... generate image ...

    # Record the cost after successful generation
    cost_tracker.record(body.provider)
    return result

@router.get("/costs")
async def get_costs():
    """Return current cost statistics."""
    return cost_tracker.get_stats()

CDN for Image Serving

For production, serve images through a CDN like Cloudflare or AWS CloudFront. This reduces latency and bandwidth costs on your server.

Option 1: Cloudflare (Free Tier)

  1. Point your domain DNS to Cloudflare
  2. Enable caching for /static/images/*
  3. Set cache TTL to 30 days (images never change once generated)
  4. Enable image optimization (Polish) for automatic WebP conversion

Option 2: Upload to S3 + CloudFront

# services/storage_service.py
import boto3
from pathlib import Path

class S3Storage:
    def __init__(self):
        self.s3 = boto3.client("s3")
        self.bucket = os.getenv("S3_BUCKET", "my-image-gen")
        self.cdn_url = os.getenv("CDN_URL", "https://cdn.example.com")

    async def upload(self, local_path: str, filename: str) -> str:
        """Upload an image to S3 and return the CDN URL."""
        self.s3.upload_file(
            local_path,
            self.bucket,
            f"images/{filename}",
            ExtraArgs={
                "ContentType": "image/png",
                "CacheControl": "public, max-age=2592000, immutable",
            },
        )
        return f"{self.cdn_url}/images/{filename}"

Production Checklist

Before going live, verify every item on this list:

CategoryItemStatus
SecurityAPI keys in environment variables, not in codeRequired
SecurityCORS restricted to your domain onlyRequired
SecurityRate limiting on all generation endpointsRequired
SecurityInput validation on all user inputsRequired
SecurityHTTPS enabled via Let's EncryptRequired
CostDaily budget limits configuredRequired
CostCost tracking and alerting set upRecommended
PerformanceNginx serving static files directlyRequired
PerformanceCDN configured for image deliveryRecommended
ReliabilityHealth check endpoint respondingRequired
ReliabilityDocker restart policy set to unless-stoppedRequired
ReliabilityLog aggregation configuredRecommended
StorageImage cleanup cron job for old filesRecommended
StorageDisk space monitoring and alertsRecommended
📌
Checkpoint: Your application is now containerized, rate-limited, cost-controlled, and ready for production. You can deploy with docker-compose up -d on any server with Docker installed. In the final lesson, you will add content moderation, user accounts, and explore monetization options.