Beginner
Feast — Open-Source Feature Store
Get started with Feast: install, define features, materialize to online/offline stores, and retrieve features for training and serving.
Getting Started
Bash — Install and Initialize Feast
# Install Feast
pip install feast
# Initialize a new feature repository
feast init feature_repo
cd feature_repo
# View the project structure
# feature_repo/
# feature_store.yaml # Configuration
# features.py # Feature definitions
# data/ # Sample data
Define Features
Python — feature_repo/features.py
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float64, Int64, String
from datetime import timedelta
# Define the entity (the primary key for feature lookups)
user = Entity(
name="user_id",
description="Unique user identifier"
)
# Define the data source
user_features_source = FileSource(
path="data/user_features.parquet",
timestamp_field="event_timestamp"
)
# Define a feature view
user_features = FeatureView(
name="user_features",
entities=[user],
schema=[
Field(name="total_spend", dtype=Float64),
Field(name="tx_count_30d", dtype=Int64),
Field(name="avg_tx_amount", dtype=Float64),
Field(name="unique_merchants", dtype=Int64),
Field(name="account_age_days", dtype=Int64),
],
source=user_features_source,
ttl=timedelta(days=1),
online=True,
tags={"team": "data-science", "project": "fraud-detection"}
)
Apply and Materialize
Bash — Register Features and Materialize
# Register feature definitions with the registry
feast apply
# Materialize features to the online store
feast materialize 2026-01-01T00:00:00 2026-03-15T00:00:00
# Incremental materialization (only new data)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
Retrieve Features for Training
Python — Point-in-Time Feature Retrieval
from feast import FeatureStore
import pandas as pd
store = FeatureStore(repo_path="feature_repo/")
# Entity DataFrame with timestamps (prevents data leakage)
entity_df = pd.DataFrame({
"user_id": ["u1", "u2", "u3", "u4"],
"event_timestamp": pd.to_datetime([
"2026-03-01", "2026-03-05", "2026-03-10", "2026-03-14"
]),
"label": [0, 1, 0, 1]
})
# Get historical features (point-in-time correct)
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"user_features:total_spend",
"user_features:tx_count_30d",
"user_features:avg_tx_amount",
"user_features:unique_merchants",
]
).to_df()
print(training_df.head())
Retrieve Features for Serving
Python — Online Feature Retrieval
# Get features for real-time inference (from online store)
features = store.get_online_features(
features=[
"user_features:total_spend",
"user_features:tx_count_30d",
"user_features:avg_tx_amount",
],
entity_rows=[{"user_id": "u123"}]
).to_dict()
print(features)
# {'user_id': ['u123'], 'total_spend': [1250.0], 'tx_count_30d': [42], ...}
Configuration
YAML — feature_store.yaml (Production)
project: fraud_detection
provider: aws
registry: s3://my-bucket/feast/registry.pb
online_store:
type: redis
connection_string: "redis-cluster.example.com:6379"
offline_store:
type: redshift
cluster_id: my-redshift-cluster
region: us-east-1
database: ml_features
user: feast_user
Start with file-based stores: For development and testing, use the default SQLite online store and file-based offline store. Upgrade to Redis + Redshift/BigQuery/Snowflake for production.
Lilly Tech Systems