Beginner

Feast — Open-Source Feature Store

Get started with Feast: install, define features, materialize to online/offline stores, and retrieve features for training and serving.

Getting Started

Bash — Install and Initialize Feast
# Install Feast
pip install feast

# Initialize a new feature repository
feast init feature_repo
cd feature_repo

# View the project structure
# feature_repo/
#   feature_store.yaml    # Configuration
#   features.py           # Feature definitions
#   data/                 # Sample data

Define Features

Python — feature_repo/features.py
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float64, Int64, String
from datetime import timedelta

# Define the entity (the primary key for feature lookups)
user = Entity(
    name="user_id",
    description="Unique user identifier"
)

# Define the data source
user_features_source = FileSource(
    path="data/user_features.parquet",
    timestamp_field="event_timestamp"
)

# Define a feature view
user_features = FeatureView(
    name="user_features",
    entities=[user],
    schema=[
        Field(name="total_spend", dtype=Float64),
        Field(name="tx_count_30d", dtype=Int64),
        Field(name="avg_tx_amount", dtype=Float64),
        Field(name="unique_merchants", dtype=Int64),
        Field(name="account_age_days", dtype=Int64),
    ],
    source=user_features_source,
    ttl=timedelta(days=1),
    online=True,
    tags={"team": "data-science", "project": "fraud-detection"}
)

Apply and Materialize

Bash — Register Features and Materialize
# Register feature definitions with the registry
feast apply

# Materialize features to the online store
feast materialize 2026-01-01T00:00:00 2026-03-15T00:00:00

# Incremental materialization (only new data)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

Retrieve Features for Training

Python — Point-in-Time Feature Retrieval
from feast import FeatureStore
import pandas as pd

store = FeatureStore(repo_path="feature_repo/")

# Entity DataFrame with timestamps (prevents data leakage)
entity_df = pd.DataFrame({
    "user_id": ["u1", "u2", "u3", "u4"],
    "event_timestamp": pd.to_datetime([
        "2026-03-01", "2026-03-05", "2026-03-10", "2026-03-14"
    ]),
    "label": [0, 1, 0, 1]
})

# Get historical features (point-in-time correct)
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "user_features:total_spend",
        "user_features:tx_count_30d",
        "user_features:avg_tx_amount",
        "user_features:unique_merchants",
    ]
).to_df()

print(training_df.head())

Retrieve Features for Serving

Python — Online Feature Retrieval
# Get features for real-time inference (from online store)
features = store.get_online_features(
    features=[
        "user_features:total_spend",
        "user_features:tx_count_30d",
        "user_features:avg_tx_amount",
    ],
    entity_rows=[{"user_id": "u123"}]
).to_dict()

print(features)
# {'user_id': ['u123'], 'total_spend': [1250.0], 'tx_count_30d': [42], ...}

Configuration

YAML — feature_store.yaml (Production)
project: fraud_detection
provider: aws
registry: s3://my-bucket/feast/registry.pb
online_store:
  type: redis
  connection_string: "redis-cluster.example.com:6379"
offline_store:
  type: redshift
  cluster_id: my-redshift-cluster
  region: us-east-1
  database: ml_features
  user: feast_user
Start with file-based stores: For development and testing, use the default SQLite online store and file-based offline store. Upgrade to Redis + Redshift/BigQuery/Snowflake for production.