Beginner

Project Setup

Before writing any code, you need to understand the architecture of an AI image generation app, decide between cloud APIs and local models, and set up your development environment. This lesson covers all three.

What You Will Build

By the end of this project, you will have a fully working web application that:

  • Accepts text prompts from users and generates images using Stable Diffusion
  • Enhances prompts automatically using an LLM for better results
  • Displays generated images in a responsive gallery with download support
  • Supports advanced features like image-to-image, inpainting, and upscaling
  • Deploys to production with Docker, rate limiting, and cost controls

Architecture Overview

The application follows a clean three-tier architecture:

+------------------+     +------------------+     +-------------------+
|   Frontend       |     |   Backend        |     |   AI Services     |
|   (HTML/CSS/JS)  |---->|   (FastAPI)      |---->|   (Stability AI / |
|                  |     |                  |     |    Replicate)     |
|  - Prompt input  |     |  - API routes    |     |  - Text-to-image  |
|  - Gallery view  |     |  - Prompt enhance|     |  - Img-to-img     |
|  - History panel |     |  - Image storage |     |  - Inpainting     |
|  - Downloads     |     |  - Rate limiting |     |  - Upscaling      |
+------------------+     +------------------+     +-------------------+
                                |
                          +-----v------+
                          |  Storage   |
                          |  (Local /  |
                          |   S3/CDN)  |
                          +------------+

The frontend sends prompts to the FastAPI backend. The backend enhances the prompt if requested, forwards it to the image generation API, saves the result, and returns the image URL to the frontend.

Stable Diffusion API vs Local Models

You have two main options for running image generation. Here is how they compare:

FactorCloud API (Stability AI / Replicate)Local (ComfyUI / Diffusers)
Setup time5 minutes1-2 hours
GPU requiredNoYes (8GB+ VRAM)
Cost per image$0.002-$0.01Free (after hardware)
Generation speed2-5 seconds5-30 seconds (depends on GPU)
Model varietyLatest models availableAny model you download
ScalabilityScales automaticallyLimited to your hardware
Best forProduction apps, teamsExperimentation, privacy
💡
This project uses cloud APIs (Stability AI and Replicate) so you can follow along without a GPU. The code is structured so you can swap in a local model later if you prefer.

Tech Stack

Here is every tool you will use in this project:

  • Python 3.10+ — Primary backend language
  • FastAPI — High-performance async web framework for the API
  • Stability AI SDK / Replicate Python client — Image generation APIs
  • OpenAI Python client — For LLM-powered prompt enhancement
  • Pillow (PIL) — Image processing, resizing, format conversion
  • HTML / CSS / JavaScript — Frontend (no framework needed)
  • Docker — Containerization for deployment
  • SQLite — Lightweight database for prompt history and metadata

Setting Up Your Environment

Step 1: Create the Project Directory

mkdir ai-image-generator
cd ai-image-generator
mkdir -p static/images templates

Step 2: Set Up a Virtual Environment

python -m venv venv

# On macOS/Linux:
source venv/bin/activate

# On Windows:
venv\Scripts\activate

Step 3: Install Dependencies

Create a requirements.txt file:

# requirements.txt
fastapi==0.110.0
uvicorn[standard]==0.27.1
python-multipart==0.0.9
stability-sdk==0.8.5
replicate==0.25.1
openai==1.14.0
Pillow==10.2.0
python-dotenv==1.0.1
aiofiles==23.2.1
jinja2==3.1.3

Install everything:

pip install -r requirements.txt

Step 4: Configure API Keys

Create a .env file in the project root. You will need at least one image generation API key:

# .env
STABILITY_API_KEY=your-stability-ai-key-here
REPLICATE_API_TOKEN=your-replicate-token-here
OPENAI_API_KEY=your-openai-key-here
Never commit your .env file. Add it to your .gitignore immediately. API keys in version control are a security risk and can result in unexpected charges.

Get your API keys from:

  • Stability AI: https://platform.stability.ai/account/keys — Free tier includes 25 credits
  • Replicate: https://replicate.com/account/api-tokens — Pay per prediction, starts free
  • OpenAI: https://platform.openai.com/api-keys — For prompt enhancement (GPT-3.5 is cheap)

Step 5: Create the Project Skeleton

Create a minimal main.py to verify everything works:

# main.py
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
from dotenv import load_dotenv
import os

load_dotenv()

app = FastAPI(title="AI Image Generator")
app.mount("/static", StaticFiles(directory="static"), name="static")
templates = Jinja2Templates(directory="templates")


@app.get("/health")
async def health_check():
    return {
        "status": "ok",
        "stability_key": bool(os.getenv("STABILITY_API_KEY")),
        "replicate_key": bool(os.getenv("REPLICATE_API_TOKEN")),
        "openai_key": bool(os.getenv("OPENAI_API_KEY")),
    }

Step 6: Test the Setup

uvicorn main:app --reload

Visit http://localhost:8000/health in your browser. You should see a JSON response confirming your API keys are loaded:

{
  "status": "ok",
  "stability_key": true,
  "replicate_key": true,
  "openai_key": true
}

Project File Structure

By the end of this project, your directory will look like this:

ai-image-generator/
├── main.py                 # FastAPI application entry point
├── routers/
│   ├── generate.py         # Image generation endpoints
│   └── gallery.py          # Gallery and history endpoints
├── services/
│   ├── image_service.py    # Image generation logic
│   ├── prompt_service.py   # Prompt enhancement logic
│   └── storage_service.py  # Image storage and retrieval
├── templates/
│   └── index.html          # Main web interface
├── static/
│   ├── css/
│   │   └── style.css       # Application styles
│   ├── js/
│   │   └── app.js          # Frontend JavaScript
│   └── images/             # Generated images stored here
├── requirements.txt        # Python dependencies
├── Dockerfile              # Container configuration
├── docker-compose.yml      # Multi-service setup
├── .env                    # API keys (not committed)
└── .gitignore              # Git ignore rules
📌
Checkpoint: You should now have a running FastAPI server at localhost:8000 with your API keys loaded. The /health endpoint confirms everything is connected. In the next lesson, you will build the core image generation API.