Beginner

Project Setup

Architecture overview, Feast and Redis setup, and complete project scaffolding.

What We Are Building

A production-ready ML feature platform that manages the full lifecycle of ML features:

  1. Define features declaratively with Feast
  2. Compute and store features offline in PostgreSQL
  3. Materialize features to Redis for low-latency online serving
  4. Serve features via a FastAPI endpoint
  5. Monitor data quality and detect drift
💡
Real-world relevance: Companies like Uber (Michelangelo), Airbnb (Zipline), and Spotify use feature platforms to share features across teams and ensure consistency between training and serving.

Architecture

Data Sources (DB, files, streams)
    |
    v
+-------------------+     +-------------------+
| Feast Registry    | --> | Offline Store     |
| (Feature Defs)    |     | (PostgreSQL)      |
+-------------------+     | - Historical data  |
    |                     | - Training queries |
    v                     +-------------------+
+-------------------+
| Online Store      |     +-------------------+
| (Redis)           | --> | Feature API       |
| - Low latency     |     | (FastAPI)         |
| - Materialized    |     | - Real-time serve |
+-------------------+     +-------------------+

Tech Stack

Feast

Open-source feature store framework. Defines features as code, handles materialization and serving.

Redis

In-memory data store for sub-millisecond feature lookups during model inference.

PostgreSQL

Reliable offline store for historical feature data and point-in-time joins.

FastAPI

High-performance Python API framework for serving features to ML models.

Step 1: Create the Project

mkdir feature-platform && cd feature-platform
python -m venv venv && source venv/bin/activate
pip install feast[redis,postgres] fastapi uvicorn pandas scikit-learn

Step 2: Project Structure

feature-platform/
  feature_repo/
    feature_store.yaml   # Feast config
    features.py          # Feature definitions (Lesson 2)
    data_sources.py      # Data source configs
  src/
    offline.py           # Offline store ops (Lesson 3)
    online.py            # Online store ops (Lesson 4)
    api.py               # FastAPI server (Lesson 5)
    monitor.py           # Monitoring (Lesson 6)
  data/
    driver_stats.parquet  # Sample data
  requirements.txt

Step 3: Feast Configuration

# feature_repo/feature_store.yaml
project: ml_features
registry: data/registry.db
provider: local
online_store:
  type: redis
  connection_string: "localhost:6379"
offline_store:
  type: file
entity_key_serialization_version: 2

Step 4: Generate Sample Data

# scripts/generate_data.py
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

np.random.seed(42)
n_drivers = 50
n_records = 5000
now = datetime.now()

data = {
    "driver_id": np.random.randint(1, n_drivers+1, n_records),
    "event_timestamp": [now - timedelta(hours=np.random.randint(0, 720))
                        for _ in range(n_records)],
    "conv_rate": np.random.uniform(0.0, 1.0, n_records),
    "acc_rate": np.random.uniform(0.0, 1.0, n_records),
    "avg_daily_trips": np.random.randint(0, 50, n_records),
    "created": [now] * n_records,
}
df = pd.DataFrame(data)
df.to_parquet("data/driver_stats.parquet")
print(f"Generated {len(df)} records for {n_drivers} drivers")
💡
Next: We will define Feast entities and feature views to describe these features declaratively.