Intermediate

LeNet and AlexNet

A comprehensive guide to lenet and alexnet within the context of cnn architectures.

LeNet-5: The Pioneer (1998)

LeNet-5, developed by Yann LeCun and colleagues at Bell Labs, is the architecture that demonstrated convolutional neural networks could achieve practical results on real-world tasks. Designed for handwritten digit recognition (the MNIST dataset), LeNet-5 was used by the US Postal Service to read zip codes on mail, processing millions of checks per day.

The architecture is simple by modern standards: two convolutional layers with average pooling, followed by three fully connected layers. It used sigmoid and tanh activations (ReLU had not yet been popularized). Despite its simplicity, LeNet established the fundamental CNN design pattern that persists in modern architectures.

# LeNet-5 architecture in PyTorch
class LeNet5(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 6, kernel_size=5),     # 32x32 -> 28x28
            nn.Tanh(),
            nn.AvgPool2d(2),                     # 28x28 -> 14x14
            nn.Conv2d(6, 16, kernel_size=5),     # 14x14 -> 10x10
            nn.Tanh(),
            nn.AvgPool2d(2),                     # 10x10 -> 5x5
        )
        self.classifier = nn.Sequential(
            nn.Linear(16 * 5 * 5, 120),
            nn.Tanh(),
            nn.Linear(120, 84),
            nn.Tanh(),
            nn.Linear(84, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        return self.classifier(x)

The AI Winter and the Gap

After LeNet, neural networks fell out of favor in computer vision. From roughly 2000 to 2011, hand-crafted features like SIFT, HOG, and SURF combined with support vector machines (SVMs) dominated computer vision benchmarks. The computational resources needed to train deep networks on large image datasets simply were not available, and the machine learning community largely dismissed neural networks as impractical.

AlexNet: The Deep Learning Revolution (2012)

In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton entered a CNN called AlexNet into the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It won by a massive margin, reducing the top-5 error rate from 26% (the previous best using hand-crafted features) to 16.4%. This result shocked the computer vision community and launched the modern deep learning era.

AlexNet Innovations

  • ReLU activation — First major network to use ReLU instead of sigmoid/tanh, enabling much faster training of deep networks
  • GPU training — Trained on two NVIDIA GTX 580 GPUs with 3GB VRAM each, splitting the network across both GPUs
  • Dropout regularization — Used dropout (p=0.5) in fully connected layers to prevent overfitting
  • Data augmentation — Random crops, horizontal flips, and color jittering to artificially increase training data
  • Local response normalization — A normalization technique later replaced by batch normalization
  • Overlapping max pooling — Used 3x3 pooling with stride 2, which slightly improved accuracy
💡
Historical context: AlexNet had 60 million parameters and was considered enormous in 2012. Today, large language models have hundreds of billions of parameters. But the fundamental CNN design principles AlexNet established remain relevant.

Architecture Comparison

Comparing LeNet and AlexNet reveals how CNNs scaled from toy problems to real-world image recognition:

  • Input size: LeNet: 32x32 grayscale. AlexNet: 227x227 RGB
  • Depth: LeNet: 2 conv + 3 FC layers. AlexNet: 5 conv + 3 FC layers
  • Parameters: LeNet: ~60K. AlexNet: ~60M (1000x more)
  • Activation: LeNet: tanh. AlexNet: ReLU
  • Training: LeNet: CPU. AlexNet: 2x GPU for 5-6 days

Impact on the Field

AlexNet's victory triggered an arms race in CNN architecture design. Within two years, VGGNet pushed depth to 19 layers, GoogLeNet introduced inception modules, and ResNet eventually reached 152 layers. The ImageNet competition became the proving ground for new architectural ideas, and every year brought dramatic improvements in accuracy and efficiency.

Lesson from history: AlexNet did not win because of a novel algorithm. It won because Krizhevsky had the engineering skill to implement and train a CNN on GPUs and the conviction to scale it up. Sometimes the most important architectural decision is simply to scale what already works.

In the next lesson, we examine VGG and GoogLeNet, which demonstrated that deeper networks with simpler, more uniform building blocks could dramatically outperform AlexNet.