AI Models
Master every AI model that matters. 50 deep dives covering frontier closed LLMs (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Grok 4, o-series, Mistral Large 2), open-weight LLMs (Llama 3.3/4, DeepSeek-V3/R1, Qwen 2.5/QwQ, Mixtral, Gemma, Phi-4, DBRX, Falcon, Yi, Nemotron, Command R+, SmolLM2), image generation (SD 3.5, SDXL, FLUX.1, DALL-E 3, gpt-image-1, Midjourney, Imagen 4, Ideogram), video generation (Sora, Runway Gen-3, Luma, Kling, Pika, HunyuanVideo), audio (Whisper, ElevenLabs, Suno, Udio, F5-TTS), embeddings (text-embedding-3, Cohere v3, BGE-M3, Voyage-3), and specialized foundation models (CLIP, SAM 2).
AI Models is the track to read when you need to choose between GPT-4o, Claude Sonnet, Gemini 2.5, Llama 4, Mistral Large, DeepSeek, Qwen, or one of the smaller open-weight alternatives for a real project. Benchmarks tell you a narrow story; the real decisions turn on context-window pricing, latency at your token sizes, hallucination behavior on your domain, safety-tuning differences, fine-tuning availability, and licensing. We cover each major model family and the tradeoffs that benchmark leaderboards hide.
A theme running through this track is that the best model is rarely the biggest one. Teams that ship reliably tend to route between multiple models, using a smaller faster cheaper model for the majority of work and reserving the flagship model for the fraction of queries that justify its cost and latency. We cover routing design, evaluation setups that let you compare models honestly on your own data, and the practical steps for migrating from one model family to another without breaking existing behavior.
All Models
50 model deep dives organized into 7 categories spanning the full AI model landscape.
Frontier Closed LLMs
GPT-5
Master GPT-5 — OpenAI's flagship model. Learn capabilities, context window, multimodal inputs, native tool use, pricing, and the patterns for production GPT-5 use.
6 LessonsGPT-4o
Master GPT-4o — OpenAI's omni-modal workhorse. Learn vision, audio, native tool use, structured outputs, and the patterns that ship most production OpenAI apps today.
6 LessonsClaude Opus 4.7
Master Claude Opus 4.7 — Anthropic's flagship for complex reasoning, coding, and agents. Learn 1M context, prompt caching, computer use, and Opus-specific patterns.
6 LessonsClaude Sonnet 4.6
Master Claude Sonnet 4.6 — Anthropic's balanced workhorse. Learn the cost/quality sweet spot, tool use, vision, and the patterns for high-volume Sonnet workloads.
6 LessonsClaude Haiku 4.5
Master Claude Haiku 4.5 — Anthropic's fast, cheap workhorse. Learn the latency/cost edge, batch use, and the patterns for high-throughput Haiku deployments.
6 LessonsGemini 2.5 Pro
Master Gemini 2.5 Pro — Google's flagship long-context multimodal model. Learn 1M-2M context, native multimodal (image, video, audio), and search grounding.
6 LessonsGemini 2.0 Flash
Master Gemini 2.0 Flash — Google's fast, cheap workhorse with native multimodal. Learn the speed/cost edge, agentic native tools, and high-throughput patterns.
6 LessonsGrok 4
Master Grok 4 — xAI's frontier model with real-time X data access. Learn the unique data advantages, voice mode, and patterns for using Grok effectively.
6 LessonsOpenAI o-series (Reasoning)
Master OpenAI's reasoning models: o1, o3, o4. Learn the chain-of-thought-as-a-service paradigm, when reasoning models beat regular ones, and cost-optimization patterns.
6 LessonsMistral Large 2
Master Mistral Large 2 — France's frontier model. Learn its tool use, JSON mode, multilingual strengths, and the European data sovereignty story.
6 LessonsOpen-Weight LLMs
Llama 3.3 70B
Master Meta Llama 3.3 70B — the most popular open-weight LLM. Learn its capabilities, fine-tuning, deployment, and why it powers most production open-LLM apps.
6 LessonsLlama 4 Family
Master Meta's Llama 4 family: Scout, Maverick, Behemoth. Learn the MoE architecture, multimodal native, 10M context, and what's new vs Llama 3.3.
6 LessonsDeepSeek-V3
Master DeepSeek-V3 — the open-weight MoE model that matches GPT-4 at 1/10 cost. Learn the architecture, training innovations, and self-hosting patterns.
6 LessonsDeepSeek-R1 Reasoning
Master DeepSeek-R1 — the open-weight reasoning model that matches OpenAI o1. Learn the reasoning architecture, distillations, and reasoning model patterns.
6 LessonsQwen 2.5 Family
Master Alibaba Qwen 2.5: 0.5B-72B sizes, Qwen-VL multimodal, Qwen-Coder. Learn the family, strengths in Chinese/English, and deployment patterns.
6 LessonsQwQ-32B Reasoning
Master Alibaba's QwQ-32B — the open-weight reasoning model. Learn the reasoning approach, when QwQ beats DeepSeek-R1, and self-hosted reasoning patterns.
6 LessonsMixtral 8x22B
Master Mistral's Mixtral 8x22B — a sparse MoE with 39B active params. Learn the MoE pattern, deployment cost, and when Mixtral beats dense alternatives.
6 LessonsGemma 2 / Gemma 3
Master Google's open-weight Gemma 2 (2B/9B/27B) and Gemma 3. Learn its strengths at small scale, tokenizer differences, and on-device deployment patterns.
6 LessonsPhi-4 (Microsoft)
Master Microsoft Phi-4 — small but mighty 14B model. Learn synthetic data training, when Phi beats much larger models, and its niche in edge/coding.
6 LessonsDBRX (Databricks)
Master Databricks DBRX — 132B MoE with 36B active. Learn the architecture, Databricks integration, and DBRX's niche in enterprise data workloads.
6 LessonsFalcon (TII)
Master TII's Falcon family (7B-180B). Learn the Mamba+attention hybrid Falcon3, training data story, and when Falcon fits a workload.
6 LessonsYi (01.AI)
Master 01.AI's Yi family (Yi-34B, Yi-Lightning). Learn the Chinese/English balance, long-context variants, and Yi's positioning in the open-weight market.
6 LessonsNVIDIA Nemotron
Master NVIDIA's Nemotron family (Llama-3.1-Nemotron-70B, Nemotron-Mini). Learn how NVIDIA tunes Llama for steerability and RLHF improvements.
6 LessonsCohere Command R+ (Open)
Master Cohere Command R+ open weights — RAG-native LLM. Learn the citation-built-in design, tool use, and self-hosted Command R+ deployment.
6 LessonsSmolLM2 / TinyLlama
Master tiny LLMs: SmolLM2 (135M-1.7B), TinyLlama (1.1B). Learn the on-device, edge, and CPU-only deployment patterns where tiny LLMs shine.
6 LessonsImage Generation Models
Stable Diffusion 3.5
Master Stability AI's SD 3.5 (Large, Medium, Turbo). Learn the MMDiT architecture, prompt engineering for SD3, and the open-image-model frontier.
6 LessonsSDXL
Master Stable Diffusion XL — still the most-fine-tuned base. Learn the architecture, refiner pipeline, ControlNet, LoRA ecosystem, and when SDXL beats SD 3.5.
6 LessonsFLUX.1 [pro/dev/schnell]
Master Black Forest Labs FLUX.1 — currently the best open image model. Learn pro vs dev vs schnell, prompt engineering, and FLUX-specific deployment patterns.
6 LessonsDALL-E 3
Master DALL-E 3 — OpenAI's text-to-image with built-in prompt rewriting. Learn the strengths in text rendering, ChatGPT integration, and production patterns.
6 Lessonsgpt-image-1 (4o-image)
Master gpt-image-1 — OpenAI's flagship image model. Learn its strengths in editing, character consistency, multi-turn editing, and the patterns that beat DALL-E 3.
6 LessonsMidjourney v7
Master Midjourney v7 — the artistic-quality leader. Learn parameters (--ar, --s, --c, --w), personalization, style references, and the Midjourney production patterns.
6 LessonsImagen 4 (Google)
Master Google Imagen 4 — frontier photorealistic image gen. Learn safety controls, aspect ratios, Vertex AI integration, and Imagen-specific prompting.
6 LessonsIdeogram
Master Ideogram (1.0, 2.0, 3.0) — the text-rendering champion. Learn typography prompting, magic prompt, style references, and Ideogram's design-focused niche.
6 LessonsVideo Generation Models
Sora (OpenAI)
Master OpenAI Sora — frontier text-to-video. Learn the diffusion-transformer architecture, capabilities, limitations, and patterns for production Sora use.
6 LessonsRunway Gen-3 Alpha
Master Runway Gen-3 Alpha and Gen-3 Alpha Turbo. Learn text-to-video, image-to-video, motion control, and the Runway video production workflow.
6 LessonsLuma Dream Machine
Master Luma Dream Machine — fast cinematic video. Learn keyframe conditioning, camera motion, loop generation, and the Dream Machine production patterns.
6 LessonsKling 1.5 / 1.6
Master Kuaishou Kling 1.5/1.6 — long-form video model. Learn motion brush, camera movement, the std vs pro modes, and Kling's production strengths.
6 LessonsPika 1.5
Master Pika Labs 1.5 — creative video with Pikaffects. Learn the effects-driven approach, image-to-video, lip sync, and Pika's stylized strengths.
6 LessonsHunyuanVideo (Tencent)
Master Tencent HunyuanVideo — 13B open-source video model. Learn the architecture, prompt engineering, self-hosting, and when HunyuanVideo wins on cost.
6 LessonsAudio & Speech Models
Whisper Large v3
Master OpenAI Whisper Large v3 — the open-weight ASR standard. Learn the architecture, multilingual support, fine-tuning, distil-whisper, and production deployment.
6 LessonsElevenLabs Multilingual v2
Master ElevenLabs Multilingual v2 + Turbo v2.5 — state-of-the-art TTS. Learn voice settings, language coverage, voice cloning, and production patterns.
6 LessonsSuno v4 (Music)
Master Suno v4 — frontier text-to-music. Learn lyric prompting, custom mode, style descriptors, audio extension, and Suno's production music workflow.
6 LessonsUdio
Master Udio — premium text-to-music model. Learn prompting, extend functionality, and when Udio beats Suno for specific music tasks.
6 LessonsF5-TTS (Open)
Master F5-TTS — frontier open-source TTS with voice cloning. Learn the architecture, voice reference, multilingual capability, and self-hosted deployment.
6 LessonsEmbedding Models
text-embedding-3-large/small
Master OpenAI text-embedding-3-large and -small. Learn Matryoshka embeddings (variable dimensions), MTEB scores, and the patterns for production OpenAI embeddings.
6 LessonsCohere Embed v3
Master Cohere Embed v3 (English, Multilingual, Image). Learn input_type=query/document, multilingual coverage, and the patterns for Cohere embeddings.
6 LessonsBGE-M3 (BAAI)
Master BGE-M3 — multi-functional, multi-lingual, multi-granularity open embedding model. Learn dense+sparse+multi-vector outputs and self-hosted patterns.
6 LessonsVoyage-3-large
Master Voyage-3 and Voyage-3-large — top-of-MTEB embeddings + voyage-rerank-2. Learn domain-tuned variants (code, finance, legal) and production patterns.
6 LessonsSpecialized Foundation Models
CLIP (OpenAI)
Master CLIP — the multimodal embedding model that connects images and text. Learn variants (ViT-L/14, OpenCLIP, SigLIP), zero-shot classification, and production CLIP.
6 LessonsSAM 2 (Meta Segment Anything)
Master SAM 2 — Meta's image+video segmentation foundation model. Learn point/box/text prompting, video tracking, and production segmentation patterns.
6 LessonsWhy an AI Models Track?
There are 100+ models that matter and a new frontier model ships every month. This track gives you a single up-to-date map.
Frontier + Open
10 frontier closed LLMs (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Grok 4, o-series, Mistral Large 2) + 15 open-weight LLMs (Llama 3.3/4, DeepSeek-V3/R1, Qwen, Mixtral, Gemma, Phi-4, DBRX, Falcon, Yi, Nemotron, Command R+, SmolLM2).
Image + Video
Image generation (SD 3.5, SDXL, FLUX.1, DALL-E 3, gpt-image-1, Midjourney, Imagen 4, Ideogram) and video generation (Sora, Runway Gen-3, Luma, Kling, Pika, HunyuanVideo).
Audio + Embeddings
Audio (Whisper Large v3, ElevenLabs Multilingual v2, Suno, Udio, F5-TTS) + embeddings (text-embedding-3, Cohere Embed v3, BGE-M3, Voyage-3-large).
Foundation Models
Specialized: CLIP for multimodal embeddings, SAM 2 for image and video segmentation.
Lilly Tech Systems