Fine-tuning Pretrained Models Advanced
Fine-tuning adapts a pretrained model to your specific task or domain. Learn when to fine-tune, how to prepare data, and use modern efficient techniques like LoRA and QLoRA.
When to Fine-tune vs Use As-Is
| Approach | When to Use | Data Needed |
|---|---|---|
| Use as-is (zero-shot) | General tasks, the model already performs well | None |
| Prompt engineering | LLMs, task can be described in natural language | A few examples |
| Fine-tune | Domain-specific data, custom labels, higher accuracy needed | 100-10,000+ samples |
| Train from scratch | Completely novel domain, massive data available | Millions of samples |
Data Preparation
Python
from datasets import load_dataset, Dataset # Load a Hugging Face dataset dataset = load_dataset("imdb") # Or create from your own data data = { "text": ["Great product!", "Terrible service", ...], "label": [1, 0, ...] } dataset = Dataset.from_dict(data) dataset = dataset.train_test_split(test_size=0.2) # Tokenize def tokenize(examples): return tokenizer(examples["text"], padding="max_length", truncation=True) tokenized = dataset.map(tokenize, batched=True)
Hugging Face Trainer API
The simplest way to fine-tune any Hugging Face model:
Python
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer # Load pretrained model with classification head model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2) # Define training arguments training_args = TrainingArguments( output_dir="./results", num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=64, learning_rate=2e-5, weight_decay=0.01, eval_strategy="epoch", save_strategy="epoch", load_best_model_at_end=True, ) # Create trainer and train trainer = Trainer( model=model, args=training_args, train_dataset=tokenized["train"], eval_dataset=tokenized["test"], ) trainer.train()
LoRA / QLoRA for Efficient Fine-tuning
LoRA (Low-Rank Adaptation) freezes the original model weights and trains small adapter matrices. This reduces trainable parameters by 90%+ and memory by 3-10x.
QLoRA combines LoRA with 4-bit quantization, making it possible to fine-tune a 7B parameter model on a single consumer GPU.
Python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training # QLoRA: Load model in 4-bit bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, ) model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B", quantization_config=bnb_config) model = prepare_model_for_kbit_training(model) # Add LoRA adapters lora_config = LoraConfig( r=16, # Rank of the update matrices lora_alpha=32, # Scaling factor target_modules=["q_proj", "v_proj"], lora_dropout=0.05, task_type="CAUSAL_LM", ) model = get_peft_model(model, lora_config) model.print_trainable_parameters() # trainable params: 4,194,304 || all params: 8,030,261,248 || trainable%: 0.05
Saving and Sharing Fine-tuned Models
Python
# Save locally model.save_pretrained("./my-fine-tuned-model") tokenizer.save_pretrained("./my-fine-tuned-model") # Push to Hugging Face Hub model.push_to_hub("username/my-fine-tuned-model") tokenizer.push_to_hub("username/my-fine-tuned-model")
Compute Requirements
| Model Size | Full Fine-tune VRAM | LoRA VRAM | QLoRA VRAM |
|---|---|---|---|
| 110M (BERT) | 4 GB | 2 GB | 1.5 GB |
| 1B | 16 GB | 8 GB | 4 GB |
| 7B (Llama, Mistral) | 60+ GB | 24 GB | 8 GB |
| 13B | 100+ GB | 40 GB | 16 GB |
| 70B | 500+ GB | 160 GB | 48 GB |
Free GPU Options: Google Colab (free T4 GPU with 15GB VRAM) and Kaggle Notebooks (free P100 or T4x2) are great for fine-tuning small to medium models with QLoRA.
Next Up
Complete the course with best practices for model selection, licensing, deployment, and ethical use.
Next: Best Practices →
Lilly Tech Systems