Small Language Models
Not every task needs a 400-billion-parameter model. Small Language Models (SLMs) deliver impressive performance at a fraction of the cost, latency, and compute. Learn about the latest SLM families — Phi, Gemma, and more — along with quantization techniques and on-device deployment strategies.
What You'll Learn
By the end of this course, you'll understand when and how to use small language models effectively, from model selection to on-device deployment.
Model Families
Deep dive into Phi, Gemma, and other SLM families. Understand their architectures, training data strategies, and where each excels.
Quantization
Learn techniques to compress models from 16-bit to 4-bit and beyond, dramatically reducing memory and compute requirements.
On-Device Deployment
Run language models on phones, laptops, and edge devices. Cover frameworks, optimization, and real-world deployment patterns.
Use Case Selection
Know when a small model is the right choice and when you need something larger. Make informed cost-performance trade-off decisions.
Course Lessons
Follow the lessons in order for a structured learning experience, or jump directly to the topic you need.
1. Introduction
Understand what small language models are, why they matter, and how they compare to large models in cost, speed, and capability.
2. Phi Models
Explore Microsoft's Phi family — from Phi-1 to Phi-4. Learn about their data-centric training approach and benchmark performance.
3. Gemma Models
Discover Google's Gemma family of open models. Cover architecture, fine-tuning, and how Gemma competes in the SLM space.
4. Quantization
Master model compression techniques including GPTQ, AWQ, GGUF, and bitsandbytes. Understand the accuracy-efficiency trade-off.
5. On-Device Deployment
Deploy SLMs on mobile devices, browsers, and edge hardware using llama.cpp, MLC-LLM, MediaPipe, and WebLLM.
6. Best Practices
Model selection frameworks, fine-tuning strategies, evaluation approaches, and production deployment patterns for SLMs.
Prerequisites
What you need before starting this course.
- Basic understanding of how language models work
- Familiarity with Python programming
- Understanding of model inference concepts (helpful)
- Experience with Hugging Face ecosystem (helpful but not required)
Lilly Tech Systems