Beginner

Introduction to Computer Vision

Computer Vision enables machines to interpret and understand visual information from the world — images, videos, and real-time camera feeds.

What is Computer Vision?

Computer Vision (CV) is a field of artificial intelligence that trains computers to interpret and understand visual information. While humans effortlessly recognize objects, read text, and navigate environments using sight, enabling machines to do the same is an extraordinarily complex challenge.

CV systems take images or videos as input and produce meaningful outputs — classifications, bounding boxes, pixel-level labels, 3D models, or textual descriptions of visual content.

How Computers "See"

To a computer, an image is simply a grid of numbers:

  • Pixels: The fundamental unit of an image. Each pixel stores color intensity values, typically ranging from 0 (black) to 255 (white).
  • Channels: Color images have multiple channels. An RGB image has 3 channels (Red, Green, Blue). A grayscale image has 1 channel.
  • Resolution: The dimensions of the image in pixels (e.g., 1920x1080 means 1920 columns and 1080 rows).
Python - Image as Numbers
import numpy as np
from PIL import Image

# Load an image
img = Image.open("photo.jpg")
pixels = np.array(img)

print(f"Shape: {pixels.shape}")     # (height, width, channels)
print(f"Data type: {pixels.dtype}") # uint8 (0-255)
print(f"Min: {pixels.min()}, Max: {pixels.max()}")

# A single pixel's RGB values
print(f"Pixel at (100,50): {pixels[100, 50]}")
# e.g., [142, 178, 210] -- R=142, G=178, B=210

Color Spaces

Color SpaceChannelsUse Case
RGBRed, Green, BlueStandard display format, most common
HSVHue, Saturation, ValueColor-based filtering and segmentation
GrayscaleSingle intensity channelEdge detection, feature extraction
LABLightness, A (green-red), B (blue-yellow)Color correction, perceptual uniformity

CV vs Image Processing vs Computer Graphics

FieldInputOutputGoal
Image ProcessingImageImageEnhance, filter, transform images
Computer VisionImage/VideoUnderstandingExtract meaning from visual data
Computer GraphicsData/ModelsImage/VideoCreate visual content from data

Applications of Computer Vision

  • Face Recognition: Unlocking phones, identity verification, photo organization. Used by Apple Face ID, social media tagging, and security systems.
  • Autonomous Driving: Self-driving cars use cameras, lidar, and radar with CV to detect lanes, vehicles, pedestrians, and traffic signs.
  • Medical Imaging: AI analyzes X-rays, MRIs, CT scans, and pathology slides to detect diseases, tumors, and anomalies.
  • Manufacturing Inspection: CV systems inspect products on assembly lines for defects, measuring quality at speeds impossible for humans.
  • Augmented Reality (AR/VR): CV powers AR experiences by understanding the 3D environment, tracking objects, and overlaying digital content.
  • Document Analysis: OCR, document classification, form extraction, and receipt scanning.
  • Agriculture: Drone-based crop monitoring, disease detection, and yield estimation.

History and Evolution

  1. 1960s — Early Research

    MIT's "Summer Vision Project" (1966) aimed to build a visual system in one summer. The problem proved far harder than expected.

  2. 1980s–1990s — Feature-Based Methods

    Hand-crafted features like edges, corners (Harris), and SIFT descriptors enabled basic recognition.

  3. 2000s — Machine Learning Era

    HOG + SVM for pedestrian detection, Viola-Jones for face detection, and bag-of-visual-words approaches.

  4. 2012 — Deep Learning Revolution

    AlexNet won ImageNet by a huge margin, proving CNNs were superior for visual tasks. This moment changed CV forever.

  5. 2020s — Foundation Models

    Vision Transformers (ViT), CLIP, SAM, and multimodal models that combine vision with language understanding.

Key takeaway: Computer Vision has evolved from hand-crafted feature detectors to powerful deep learning models that can understand visual content at near-human levels. The field continues to advance rapidly with foundation models and multimodal architectures.