LeNet and AlexNet
A comprehensive guide to lenet and alexnet within the context of cnn architectures.
LeNet-5: The Pioneer (1998)
LeNet-5, developed by Yann LeCun and colleagues at Bell Labs, is the architecture that demonstrated convolutional neural networks could achieve practical results on real-world tasks. Designed for handwritten digit recognition (the MNIST dataset), LeNet-5 was used by the US Postal Service to read zip codes on mail, processing millions of checks per day.
The architecture is simple by modern standards: two convolutional layers with average pooling, followed by three fully connected layers. It used sigmoid and tanh activations (ReLU had not yet been popularized). Despite its simplicity, LeNet established the fundamental CNN design pattern that persists in modern architectures.
# LeNet-5 architecture in PyTorch
class LeNet5(nn.Module):
def __init__(self):
super().__init__()
self.features = nn.Sequential(
nn.Conv2d(1, 6, kernel_size=5), # 32x32 -> 28x28
nn.Tanh(),
nn.AvgPool2d(2), # 28x28 -> 14x14
nn.Conv2d(6, 16, kernel_size=5), # 14x14 -> 10x10
nn.Tanh(),
nn.AvgPool2d(2), # 10x10 -> 5x5
)
self.classifier = nn.Sequential(
nn.Linear(16 * 5 * 5, 120),
nn.Tanh(),
nn.Linear(120, 84),
nn.Tanh(),
nn.Linear(84, 10)
)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
return self.classifier(x)
The AI Winter and the Gap
After LeNet, neural networks fell out of favor in computer vision. From roughly 2000 to 2011, hand-crafted features like SIFT, HOG, and SURF combined with support vector machines (SVMs) dominated computer vision benchmarks. The computational resources needed to train deep networks on large image datasets simply were not available, and the machine learning community largely dismissed neural networks as impractical.
AlexNet: The Deep Learning Revolution (2012)
In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton entered a CNN called AlexNet into the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It won by a massive margin, reducing the top-5 error rate from 26% (the previous best using hand-crafted features) to 16.4%. This result shocked the computer vision community and launched the modern deep learning era.
AlexNet Innovations
- ReLU activation — First major network to use ReLU instead of sigmoid/tanh, enabling much faster training of deep networks
- GPU training — Trained on two NVIDIA GTX 580 GPUs with 3GB VRAM each, splitting the network across both GPUs
- Dropout regularization — Used dropout (p=0.5) in fully connected layers to prevent overfitting
- Data augmentation — Random crops, horizontal flips, and color jittering to artificially increase training data
- Local response normalization — A normalization technique later replaced by batch normalization
- Overlapping max pooling — Used 3x3 pooling with stride 2, which slightly improved accuracy
Architecture Comparison
Comparing LeNet and AlexNet reveals how CNNs scaled from toy problems to real-world image recognition:
- Input size: LeNet: 32x32 grayscale. AlexNet: 227x227 RGB
- Depth: LeNet: 2 conv + 3 FC layers. AlexNet: 5 conv + 3 FC layers
- Parameters: LeNet: ~60K. AlexNet: ~60M (1000x more)
- Activation: LeNet: tanh. AlexNet: ReLU
- Training: LeNet: CPU. AlexNet: 2x GPU for 5-6 days
Impact on the Field
AlexNet's victory triggered an arms race in CNN architecture design. Within two years, VGGNet pushed depth to 19 layers, GoogLeNet introduced inception modules, and ResNet eventually reached 152 layers. The ImageNet competition became the proving ground for new architectural ideas, and every year brought dramatic improvements in accuracy and efficiency.
In the next lesson, we examine VGG and GoogLeNet, which demonstrated that deeper networks with simpler, more uniform building blocks could dramatically outperform AlexNet.
Lilly Tech Systems