Project Setup
Before writing any code, you need to understand the architecture of an AI image generation app, decide between cloud APIs and local models, and set up your development environment. This lesson covers all three.
What You Will Build
By the end of this project, you will have a fully working web application that:
- Accepts text prompts from users and generates images using Stable Diffusion
- Enhances prompts automatically using an LLM for better results
- Displays generated images in a responsive gallery with download support
- Supports advanced features like image-to-image, inpainting, and upscaling
- Deploys to production with Docker, rate limiting, and cost controls
Architecture Overview
The application follows a clean three-tier architecture:
+------------------+ +------------------+ +-------------------+
| Frontend | | Backend | | AI Services |
| (HTML/CSS/JS) |---->| (FastAPI) |---->| (Stability AI / |
| | | | | Replicate) |
| - Prompt input | | - API routes | | - Text-to-image |
| - Gallery view | | - Prompt enhance| | - Img-to-img |
| - History panel | | - Image storage | | - Inpainting |
| - Downloads | | - Rate limiting | | - Upscaling |
+------------------+ +------------------+ +-------------------+
|
+-----v------+
| Storage |
| (Local / |
| S3/CDN) |
+------------+
The frontend sends prompts to the FastAPI backend. The backend enhances the prompt if requested, forwards it to the image generation API, saves the result, and returns the image URL to the frontend.
Stable Diffusion API vs Local Models
You have two main options for running image generation. Here is how they compare:
| Factor | Cloud API (Stability AI / Replicate) | Local (ComfyUI / Diffusers) |
|---|---|---|
| Setup time | 5 minutes | 1-2 hours |
| GPU required | No | Yes (8GB+ VRAM) |
| Cost per image | $0.002-$0.01 | Free (after hardware) |
| Generation speed | 2-5 seconds | 5-30 seconds (depends on GPU) |
| Model variety | Latest models available | Any model you download |
| Scalability | Scales automatically | Limited to your hardware |
| Best for | Production apps, teams | Experimentation, privacy |
Tech Stack
Here is every tool you will use in this project:
- Python 3.10+ — Primary backend language
- FastAPI — High-performance async web framework for the API
- Stability AI SDK / Replicate Python client — Image generation APIs
- OpenAI Python client — For LLM-powered prompt enhancement
- Pillow (PIL) — Image processing, resizing, format conversion
- HTML / CSS / JavaScript — Frontend (no framework needed)
- Docker — Containerization for deployment
- SQLite — Lightweight database for prompt history and metadata
Setting Up Your Environment
Step 1: Create the Project Directory
mkdir ai-image-generator
cd ai-image-generator
mkdir -p static/images templates
Step 2: Set Up a Virtual Environment
python -m venv venv
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate
Step 3: Install Dependencies
Create a requirements.txt file:
# requirements.txt
fastapi==0.110.0
uvicorn[standard]==0.27.1
python-multipart==0.0.9
stability-sdk==0.8.5
replicate==0.25.1
openai==1.14.0
Pillow==10.2.0
python-dotenv==1.0.1
aiofiles==23.2.1
jinja2==3.1.3
Install everything:
pip install -r requirements.txt
Step 4: Configure API Keys
Create a .env file in the project root. You will need at least one image generation API key:
# .env
STABILITY_API_KEY=your-stability-ai-key-here
REPLICATE_API_TOKEN=your-replicate-token-here
OPENAI_API_KEY=your-openai-key-here
.gitignore immediately. API keys in version control are a security risk and can result in unexpected charges.Get your API keys from:
- Stability AI:
https://platform.stability.ai/account/keys— Free tier includes 25 credits - Replicate:
https://replicate.com/account/api-tokens— Pay per prediction, starts free - OpenAI:
https://platform.openai.com/api-keys— For prompt enhancement (GPT-3.5 is cheap)
Step 5: Create the Project Skeleton
Create a minimal main.py to verify everything works:
# main.py
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
from dotenv import load_dotenv
import os
load_dotenv()
app = FastAPI(title="AI Image Generator")
app.mount("/static", StaticFiles(directory="static"), name="static")
templates = Jinja2Templates(directory="templates")
@app.get("/health")
async def health_check():
return {
"status": "ok",
"stability_key": bool(os.getenv("STABILITY_API_KEY")),
"replicate_key": bool(os.getenv("REPLICATE_API_TOKEN")),
"openai_key": bool(os.getenv("OPENAI_API_KEY")),
}
Step 6: Test the Setup
uvicorn main:app --reload
Visit http://localhost:8000/health in your browser. You should see a JSON response confirming your API keys are loaded:
{
"status": "ok",
"stability_key": true,
"replicate_key": true,
"openai_key": true
}
Project File Structure
By the end of this project, your directory will look like this:
ai-image-generator/
├── main.py # FastAPI application entry point
├── routers/
│ ├── generate.py # Image generation endpoints
│ └── gallery.py # Gallery and history endpoints
├── services/
│ ├── image_service.py # Image generation logic
│ ├── prompt_service.py # Prompt enhancement logic
│ └── storage_service.py # Image storage and retrieval
├── templates/
│ └── index.html # Main web interface
├── static/
│ ├── css/
│ │ └── style.css # Application styles
│ ├── js/
│ │ └── app.js # Frontend JavaScript
│ └── images/ # Generated images stored here
├── requirements.txt # Python dependencies
├── Dockerfile # Container configuration
├── docker-compose.yml # Multi-service setup
├── .env # API keys (not committed)
└── .gitignore # Git ignore rules
localhost:8000 with your API keys loaded. The /health endpoint confirms everything is connected. In the next lesson, you will build the core image generation API.
Lilly Tech Systems