Fine-tuning Pretrained Models Advanced

Fine-tuning adapts a pretrained model to your specific task or domain. Learn when to fine-tune, how to prepare data, and use modern efficient techniques like LoRA and QLoRA.

When to Fine-tune vs Use As-Is

Approach	When to Use	Data Needed
Use as-is (zero-shot)	General tasks, the model already performs well	None
Prompt engineering	LLMs, task can be described in natural language	A few examples
Fine-tune	Domain-specific data, custom labels, higher accuracy needed	100-10,000+ samples
Train from scratch	Completely novel domain, massive data available	Millions of samples

Data Preparation

Python

from datasets import load_dataset, Dataset

# Load a Hugging Face dataset
dataset = load_dataset("imdb")

# Or create from your own data
data = {
    "text": ["Great product!", "Terrible service", ...],
    "label": [1, 0, ...]
}
dataset = Dataset.from_dict(data)
dataset = dataset.train_test_split(test_size=0.2)

# Tokenize
def tokenize(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized = dataset.map(tokenize, batched=True)

Hugging Face Trainer API

The simplest way to fine-tune any Hugging Face model:

Python

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

# Load pretrained model with classification head
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    learning_rate=2e-5,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

# Create trainer and train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["test"],
)
trainer.train()

LoRA / QLoRA for Efficient Fine-tuning

LoRA (Low-Rank Adaptation) freezes the original model weights and trains small adapter matrices. This reduces trainable parameters by 90%+ and memory by 3-10x.

QLoRA combines LoRA with 4-bit quantization, making it possible to fine-tune a 7B parameter model on a single consumer GPU.

Python

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# QLoRA: Load model in 4-bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B", quantization_config=bnb_config)
model = prepare_model_for_kbit_training(model)

# Add LoRA adapters
lora_config = LoraConfig(
    r=16,                # Rank of the update matrices
    lora_alpha=32,       # Scaling factor
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 8,030,261,248 || trainable%: 0.05

Saving and Sharing Fine-tuned Models

Python

# Save locally
model.save_pretrained("./my-fine-tuned-model")
tokenizer.save_pretrained("./my-fine-tuned-model")

# Push to Hugging Face Hub
model.push_to_hub("username/my-fine-tuned-model")
tokenizer.push_to_hub("username/my-fine-tuned-model")

Compute Requirements

Model Size	Full Fine-tune VRAM	LoRA VRAM	QLoRA VRAM
110M (BERT)	4 GB	2 GB	1.5 GB
1B	16 GB	8 GB	4 GB
7B (Llama, Mistral)	60+ GB	24 GB	8 GB
13B	100+ GB	40 GB	16 GB
70B	500+ GB	160 GB	48 GB

Free GPU Options: Google Colab (free T4 GPU with 15GB VRAM) and Kaggle Notebooks (free P100 or T4x2) are great for fine-tuning small to medium models with QLoRA.

Next Up

Complete the course with best practices for model selection, licensing, deployment, and ethical use.

Next: Best Practices →

← Using Models Best Practices →