Introduction to Edge AI / TinyML
Edge AI runs machine learning models directly on devices — phones, microcontrollers, cameras, and IoT sensors — without sending data to the cloud.
What is Edge AI?
Edge AI refers to running AI algorithms locally on hardware devices at the "edge" of the network, close to where data is generated. Instead of sending data to a cloud server for processing, the AI model runs directly on the device in real time.
TinyML is a subset of Edge AI focused on running ML models on extremely resource-constrained devices — microcontrollers with kilobytes of memory, milliwatts of power, and no operating system.
Why Edge AI?
| Benefit | Cloud AI | Edge AI |
|---|---|---|
| Latency | 100-500ms round-trip | <10ms local inference |
| Privacy | Data sent to server | Data stays on device |
| Connectivity | Requires internet | Works offline |
| Bandwidth | High data transfer cost | Minimal (only results sent) |
| Cost | Per-query cloud fees | One-time device cost |
| Model Size | Unlimited | Constrained by device |
Edge AI Applications
- Smartphones: Face unlock, voice assistants, computational photography, real-time translation. Apple's Neural Engine processes 15.8 trillion operations per second on-device.
- Smart Cameras: Person detection, license plate recognition, and anomaly detection without streaming video to the cloud.
- Wearables: Heart rate anomaly detection on smartwatches, fall detection, and activity recognition.
- Industrial IoT: Predictive maintenance on factory sensors, quality inspection on production lines, real-time vibration analysis.
- Autonomous Vehicles: Object detection and path planning must happen in milliseconds. Cloud latency would be dangerous.
- Agriculture: Crop disease detection from drone cameras, soil quality analysis from IoT sensors.
The Edge AI Stack
Train in the Cloud
Train your full-size model using GPUs in the cloud with frameworks like PyTorch or TensorFlow.
Optimize the Model
Apply quantization (FP32 to INT8), pruning, and knowledge distillation to shrink the model for edge deployment.
Convert to Edge Format
Export to TensorFlow Lite (.tflite), ONNX (.onnx), CoreML (.mlmodel), or TensorRT for the target device.
Deploy to Device
Load the optimized model on the edge device and run inference using the appropriate runtime.
Monitor and Update
Use OTA (over-the-air) updates to push improved models to deployed devices.
Edge AI vs TinyML
| Aspect | Edge AI | TinyML |
|---|---|---|
| Devices | Phones, Jetson, Raspberry Pi | Arduino, ESP32, STM32 |
| Memory | GBs of RAM | KBs to MBs of RAM |
| Power | Watts | Milliwatts to microwatts |
| Models | MobileNet, YOLOv8-nano | Tiny CNNs, keyword spotting |
| Runtime | TFLite, ONNX Runtime | TFLite Micro, Edge Impulse |