Advanced

Fine-tuning Stable Diffusion

Train custom models using DreamBooth, LoRA, and textual inversion to generate personalized characters, styles, and objects.

Fine-tuning Methods Compared

Comparison
DreamBooth
  What: Fine-tunes the entire model on your images
  Images needed: 5-30 high quality images
  Training time: 30-60 minutes on a good GPU
  File size: 2-4 GB (full model checkpoint)
  Best for: Specific subjects (faces, pets, products)

LoRA (Low-Rank Adaptation)
  What: Trains small adapter layers, not the full model
  Images needed: 10-50 images
  Training time: 15-30 minutes
  File size: 10-200 MB (adapter only)
  Best for: Styles, characters, concepts (most popular)

Textual Inversion
  What: Learns a new text embedding for your concept
  Images needed: 3-10 images
  Training time: 1-3 hours
  File size: 10-100 KB (tiny!)
  Best for: Simple concepts, styles, textures

DreamBooth Training

Python
# Using the diffusers DreamBooth training script
accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --instance_data_dir="./my_dog_photos" \
  --output_dir="./dreambooth-my-dog" \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --max_train_steps=800 \
  --with_prior_preservation \
  --class_data_dir="./class_dog_photos" \
  --class_prompt="a photo of dog"

LoRA Training

LoRA is the most popular fine-tuning method because of its small file size and flexibility:

Python
# Using the diffusers LoRA training script
accelerate launch train_text_to_image_lora.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --train_data_dir="./my_style_images" \
  --output_dir="./lora-my-style" \
  --resolution=512 \
  --train_batch_size=1 \
  --num_train_epochs=100 \
  --learning_rate=1e-4 \
  --rank=4 \
  --seed=42

# Using the trained LoRA
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.load_lora_weights("./lora-my-style")

image = pipe("a landscape in my custom style").images[0]

Textual Inversion

Python
# Textual inversion learns a new token embedding
accelerate launch textual_inversion.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --train_data_dir="./my_concept_images" \
  --learnable_property="style" \
  --placeholder_token="<my-style>" \
  --initializer_token="painting" \
  --resolution=512 \
  --train_batch_size=1 \
  --max_train_steps=3000 \
  --learning_rate=5e-4

# Then use the learned token in prompts:
"a portrait of a woman in <my-style> style"
💡
Recommendation: Start with LoRA for most use cases. It offers the best balance of quality, training speed, file size, and flexibility. You can stack multiple LoRAs and adjust their weights at inference time.

What's Next?

The next lesson covers the tools and UIs available for running Stable Diffusion, from ComfyUI to Automatic1111.