Online Feature Store Design Intermediate

The online store serves pre-computed features at low latency for real-time ML inference. When a user clicks "Buy Now" and your fraud detection model needs 50 features in under 10ms, the online store is what makes this possible. This lesson covers storage selection, materialization pipelines, and cache strategies used at production scale.

Online Store Architecture

Architecture

Request Flow: Model Inference with Feature Store
=================================================

Client Request
    |
    v
+------------------+       +-------------------+
|  ML Serving      |------>|  Online Feature    |
|  (TorchServe,    |       |  Store (Redis)     |
|   TF Serving)    |       |  p99 < 5ms         |
+------------------+       +-------------------+
    |                               ^
    |  Features + Request           |  Materialization
    v                               |  (hourly/daily)
+------------------+       +-------------------+
|  Model           |       |  Offline Store     |
|  Prediction      |       |  (Delta Lake)      |
+------------------+       +-------------------+
    |
    v
Response (50-100ms total budget)
  - Feature fetch:   5-10ms
  - Model inference: 10-30ms
  - Network/other:   10-20ms

Storage Backend Comparison

Backend	p99 Latency	Max Throughput	Cost Model	Best For
Redis Cluster	<1ms (same AZ)	1M+ ops/sec	Memory-based ($$)	Lowest latency, moderate data size
DynamoDB	1-5ms	Millions RCU	Pay per read/write	Serverless, auto-scaling, AWS-native
Bigtable	2-6ms	10K+ reads/sec/node	Node-based	Wide rows, GCP-native, large datasets
Cassandra	2-10ms	High (tunable)	Node-based	Multi-region, open-source, write-heavy
SQLite (Feast)	<1ms (local)	Single process	Free	Development, testing, edge deployment

Redis-Based Online Store Implementation

Python

# Production Redis online store with feature encoding
import redis
import struct
import hashlib
from typing import Dict, List, Optional

class RedisOnlineStore:
    """Production-grade Redis online feature store."""

    def __init__(self, host: str, port: int = 6379,
                 cluster_mode: bool = True):
        if cluster_mode:
            self.client = redis.RedisCluster(
                host=host, port=port,
                decode_responses=False,
                socket_timeout=0.1,        # 100ms timeout
                socket_connect_timeout=0.5, # 500ms connect timeout
                retry_on_timeout=True,
                max_connections=50,
            )
        else:
            self.client = redis.Redis(
                host=host, port=port,
                decode_responses=False,
            )

    def _feature_key(self, feature_view: str, entity_key: str) -> str:
        """Generate Redis key: feature_view:entity_hash"""
        entity_hash = hashlib.md5(entity_key.encode()).hexdigest()[:12]
        return f"fs:{feature_view}:{entity_hash}"

    def write_features(self, feature_view: str, entity_key: str,
                       features: Dict[str, float], ttl_seconds: int = 86400):
        """Write feature values to Redis with TTL."""
        key = self._feature_key(feature_view, entity_key)
        # Use Redis hash for efficient multi-feature storage
        pipe = self.client.pipeline()
        # Pack floats as binary for 50% memory savings vs strings
        encoded = {k: struct.pack("d", v) for k, v in features.items()}
        pipe.hset(key, mapping=encoded)
        pipe.expire(key, ttl_seconds)
        pipe.execute()

    def read_features(self, feature_view: str, entity_key: str,
                      feature_names: List[str]) -> Dict[str, Optional[float]]:
        """Read feature values from Redis. Returns None for missing features."""
        key = self._feature_key(feature_view, entity_key)
        values = self.client.hmget(key, feature_names)
        result = {}
        for name, val in zip(feature_names, values):
            if val is not None:
                result[name] = struct.unpack("d", val)[0]
            else:
                result[name] = None  # Feature not found
        return result

    def batch_read(self, feature_view: str, entity_keys: List[str],
                    feature_names: List[str]) -> List[Dict]:
        """Batch read using Redis pipeline for minimal round trips."""
        pipe = self.client.pipeline()
        for entity_key in entity_keys:
            key = self._feature_key(feature_view, entity_key)
            pipe.hmget(key, feature_names)
        results = pipe.execute()
        return [
            {name: struct.unpack("d", val)[0] if val else None
             for name, val in zip(feature_names, row)}
            for row in results
        ]

# Usage
store = RedisOnlineStore("redis-cluster.prod.internal")
features = store.read_features(
    "user_spending_stats",
    "user_12345",
    ["avg_transaction_amount", "transaction_count_30d", "total_spend_90d"],
)
# Typical latency: 0.3-0.8ms for same-AZ Redis Cluster

Materialization Pipeline

Materialization moves features from the offline store to the online store. This is the bridge between batch-computed features and real-time serving.

Python

# Feast materialization: offline store -> online store
from feast import FeatureStore
from datetime import datetime, timedelta

store = FeatureStore(repo_path="feature_repo/")

# Materialize features computed in the last 24 hours
store.materialize_incremental(
    end_date=datetime.utcnow(),
)
# This reads new feature rows from the offline store
# and writes the latest value per entity to the online store

# For initial backfill or recovery:
store.materialize(
    start_date=datetime.utcnow() - timedelta(days=7),
    end_date=datetime.utcnow(),
    feature_views=["user_spending_stats"],  # Specific views only
)

# Production: Run materialization on a schedule via Airflow
# @daily for batch features, @hourly for near-real-time

DynamoDB Online Store with Feast

YAML

# feature_store.yaml - Feast configuration with DynamoDB online store
project: fraud_detection
registry: s3://feast-registry/registry.db
provider: aws

online_store:
  type: dynamodb
  region: us-east-1
  table_name_prefix: feast_

offline_store:
  type: redshift
  cluster_id: feature-store-cluster
  region: us-east-1
  database: features
  user: feast_user
  s3_staging_location: s3://feast-staging/redshift/

# DynamoDB auto-scales based on read/write demand
# Provisioned capacity: 1000 RCU, 500 WCU (adjust per workload)
# On-demand mode: better for unpredictable traffic patterns

Cache Strategies for Feature Serving

Strategy	Description	When to Use
Write-through	Materialization writes to both offline and online store simultaneously	Strong consistency requirements
Write-behind	Online store updated asynchronously after offline write	High write throughput, eventual consistency OK
Application-level cache	In-process LRU cache in the ML serving layer	Repeated entity lookups within a request batch
Precomputed feature vectors	Store entire feature vectors as a single blob per entity	Fixed feature sets, minimal per-request computation

Python

# Application-level feature cache for ML serving
from functools import lru_cache
import time

class CachedFeatureClient:
    """Wraps feature store with TTL-based in-memory cache."""

    def __init__(self, store, cache_ttl_seconds=60):
        self.store = store
        self.cache_ttl = cache_ttl_seconds
        self._cache = {}
        self._cache_timestamps = {}

    def get_features(self, entity_key: str, feature_names: list) -> dict:
        cache_key = f"{entity_key}:{','.join(sorted(feature_names))}"
        now = time.monotonic()

        # Check cache freshness
        if (cache_key in self._cache and
            now - self._cache_timestamps[cache_key] < self.cache_ttl):
            return self._cache[cache_key]

        # Cache miss: fetch from online store
        features = self.store.get_online_features(
            features=feature_names,
            entity_rows=[{"user_id": entity_key}],
        ).to_dict()

        self._cache[cache_key] = features
        self._cache_timestamps[cache_key] = now
        return features

# Use short TTL (30-60s) for features that change frequently
# Use longer TTL (5-15 min) for slowly-changing features
# NEVER cache features that must reflect real-time state

Memory Sizing: For Redis-based online stores, estimate memory as: num_entities * num_features * 16 bytes * 1.5 (overhead). For 10M users with 50 features each, that is roughly 12 GB. Redis Cluster can shard this across nodes, but DynamoDB may be more cost-effective above 50 GB.

Ready for Real-Time Features?

The next lesson covers streaming feature computation with Kafka and Flink for features that need sub-second freshness.

Next: Real-Time Feature Engineering →

← Offline Store Design Real-Time Features →