Beginner
Project Setup
Architecture overview, Feast and Redis setup, and complete project scaffolding.
What We Are Building
A production-ready ML feature platform that manages the full lifecycle of ML features:
- Define features declaratively with Feast
- Compute and store features offline in PostgreSQL
- Materialize features to Redis for low-latency online serving
- Serve features via a FastAPI endpoint
- Monitor data quality and detect drift
Real-world relevance: Companies like Uber (Michelangelo), Airbnb (Zipline), and Spotify use feature platforms to share features across teams and ensure consistency between training and serving.
Architecture
Data Sources (DB, files, streams)
|
v
+-------------------+ +-------------------+
| Feast Registry | --> | Offline Store |
| (Feature Defs) | | (PostgreSQL) |
+-------------------+ | - Historical data |
| | - Training queries |
v +-------------------+
+-------------------+
| Online Store | +-------------------+
| (Redis) | --> | Feature API |
| - Low latency | | (FastAPI) |
| - Materialized | | - Real-time serve |
+-------------------+ +-------------------+
Tech Stack
Feast
Open-source feature store framework. Defines features as code, handles materialization and serving.
Redis
In-memory data store for sub-millisecond feature lookups during model inference.
PostgreSQL
Reliable offline store for historical feature data and point-in-time joins.
FastAPI
High-performance Python API framework for serving features to ML models.
Step 1: Create the Project
mkdir feature-platform && cd feature-platform
python -m venv venv && source venv/bin/activate
pip install feast[redis,postgres] fastapi uvicorn pandas scikit-learn
Step 2: Project Structure
feature-platform/
feature_repo/
feature_store.yaml # Feast config
features.py # Feature definitions (Lesson 2)
data_sources.py # Data source configs
src/
offline.py # Offline store ops (Lesson 3)
online.py # Online store ops (Lesson 4)
api.py # FastAPI server (Lesson 5)
monitor.py # Monitoring (Lesson 6)
data/
driver_stats.parquet # Sample data
requirements.txt
Step 3: Feast Configuration
# feature_repo/feature_store.yaml
project: ml_features
registry: data/registry.db
provider: local
online_store:
type: redis
connection_string: "localhost:6379"
offline_store:
type: file
entity_key_serialization_version: 2
Step 4: Generate Sample Data
# scripts/generate_data.py
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
np.random.seed(42)
n_drivers = 50
n_records = 5000
now = datetime.now()
data = {
"driver_id": np.random.randint(1, n_drivers+1, n_records),
"event_timestamp": [now - timedelta(hours=np.random.randint(0, 720))
for _ in range(n_records)],
"conv_rate": np.random.uniform(0.0, 1.0, n_records),
"acc_rate": np.random.uniform(0.0, 1.0, n_records),
"avg_daily_trips": np.random.randint(0, 50, n_records),
"created": [now] * n_records,
}
df = pd.DataFrame(data)
df.to_parquet("data/driver_stats.parquet")
print(f"Generated {len(df)} records for {n_drivers} drivers")
Next: We will define Feast entities and feature views to describe these features declaratively.
Lilly Tech Systems