Advanced
Fine-tuning Stable Diffusion
Train custom models using DreamBooth, LoRA, and textual inversion to generate personalized characters, styles, and objects.
Fine-tuning Methods Compared
Comparison
DreamBooth What: Fine-tunes the entire model on your images Images needed: 5-30 high quality images Training time: 30-60 minutes on a good GPU File size: 2-4 GB (full model checkpoint) Best for: Specific subjects (faces, pets, products) LoRA (Low-Rank Adaptation) What: Trains small adapter layers, not the full model Images needed: 10-50 images Training time: 15-30 minutes File size: 10-200 MB (adapter only) Best for: Styles, characters, concepts (most popular) Textual Inversion What: Learns a new text embedding for your concept Images needed: 3-10 images Training time: 1-3 hours File size: 10-100 KB (tiny!) Best for: Simple concepts, styles, textures
DreamBooth Training
Python
# Using the diffusers DreamBooth training script accelerate launch train_dreambooth.py \ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ --instance_data_dir="./my_dog_photos" \ --output_dir="./dreambooth-my-dog" \ --instance_prompt="a photo of sks dog" \ --resolution=512 \ --train_batch_size=1 \ --gradient_accumulation_steps=1 \ --learning_rate=5e-6 \ --max_train_steps=800 \ --with_prior_preservation \ --class_data_dir="./class_dog_photos" \ --class_prompt="a photo of dog"
LoRA Training
LoRA is the most popular fine-tuning method because of its small file size and flexibility:
Python
# Using the diffusers LoRA training script accelerate launch train_text_to_image_lora.py \ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ --train_data_dir="./my_style_images" \ --output_dir="./lora-my-style" \ --resolution=512 \ --train_batch_size=1 \ --num_train_epochs=100 \ --learning_rate=1e-4 \ --rank=4 \ --seed=42 # Using the trained LoRA from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe.load_lora_weights("./lora-my-style") image = pipe("a landscape in my custom style").images[0]
Textual Inversion
Python
# Textual inversion learns a new token embedding accelerate launch textual_inversion.py \ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ --train_data_dir="./my_concept_images" \ --learnable_property="style" \ --placeholder_token="<my-style>" \ --initializer_token="painting" \ --resolution=512 \ --train_batch_size=1 \ --max_train_steps=3000 \ --learning_rate=5e-4 # Then use the learned token in prompts: "a portrait of a woman in <my-style> style"
Recommendation: Start with LoRA for most use cases. It offers the best balance of quality, training speed, file size, and flexibility. You can stack multiple LoRAs and adjust their weights at inference time.
What's Next?
The next lesson covers the tools and UIs available for running Stable Diffusion, from ComfyUI to Automatic1111.
Lilly Tech Systems