Small Language Models

Not every task needs a 400-billion-parameter model. Small Language Models (SLMs) deliver impressive performance at a fraction of the cost, latency, and compute. Learn about the latest SLM families — Phi, Gemma, and more — along with quantization techniques and on-device deployment strategies.

Start Course → Phi Models

Lessons

15+

Models

~3hr

Total Time

⚡

Efficient

What You'll Learn

By the end of this course, you'll understand when and how to use small language models effectively, from model selection to on-device deployment.

💫

Model Families

Deep dive into Phi, Gemma, and other SLM families. Understand their architectures, training data strategies, and where each excels.

📦

Quantization

Learn techniques to compress models from 16-bit to 4-bit and beyond, dramatically reducing memory and compute requirements.

📱

On-Device Deployment

Run language models on phones, laptops, and edge devices. Cover frameworks, optimization, and real-world deployment patterns.

🎯

Use Case Selection

Know when a small model is the right choice and when you need something larger. Make informed cost-performance trade-off decisions.

Course Lessons

Follow the lessons in order for a structured learning experience, or jump directly to the topic you need.

Beginner

1. Introduction

Understand what small language models are, why they matter, and how they compare to large models in cost, speed, and capability.

12 min read →

Intermediate

2. Phi Models

Explore Microsoft's Phi family — from Phi-1 to Phi-4. Learn about their data-centric training approach and benchmark performance.

15 min read →

Intermediate

3. Gemma Models

Discover Google's Gemma family of open models. Cover architecture, fine-tuning, and how Gemma competes in the SLM space.

15 min read →

Intermediate

4. Quantization

Master model compression techniques including GPTQ, AWQ, GGUF, and bitsandbytes. Understand the accuracy-efficiency trade-off.

18 min read →

Advanced

5. On-Device Deployment

Deploy SLMs on mobile devices, browsers, and edge hardware using llama.cpp, MLC-LLM, MediaPipe, and WebLLM.

20 min read →

Advanced

6. Best Practices

Model selection frameworks, fine-tuning strategies, evaluation approaches, and production deployment patterns for SLMs.

12 min read →

Prerequisites

What you need before starting this course.

Before You Begin:

Basic understanding of how language models work
Familiarity with Python programming
Understanding of model inference concepts (helpful)
Experience with Hugging Face ecosystem (helpful but not required)