Online Feature Store Design Intermediate
The online store serves pre-computed features at low latency for real-time ML inference. When a user clicks "Buy Now" and your fraud detection model needs 50 features in under 10ms, the online store is what makes this possible. This lesson covers storage selection, materialization pipelines, and cache strategies used at production scale.
Online Store Architecture
Architecture
Request Flow: Model Inference with Feature Store
=================================================
Client Request
|
v
+------------------+ +-------------------+
| ML Serving |------>| Online Feature |
| (TorchServe, | | Store (Redis) |
| TF Serving) | | p99 < 5ms |
+------------------+ +-------------------+
| ^
| Features + Request | Materialization
v | (hourly/daily)
+------------------+ +-------------------+
| Model | | Offline Store |
| Prediction | | (Delta Lake) |
+------------------+ +-------------------+
|
v
Response (50-100ms total budget)
- Feature fetch: 5-10ms
- Model inference: 10-30ms
- Network/other: 10-20ms
Storage Backend Comparison
| Backend | p99 Latency | Max Throughput | Cost Model | Best For |
|---|---|---|---|---|
| Redis Cluster | <1ms (same AZ) | 1M+ ops/sec | Memory-based ($$) | Lowest latency, moderate data size |
| DynamoDB | 1-5ms | Millions RCU | Pay per read/write | Serverless, auto-scaling, AWS-native |
| Bigtable | 2-6ms | 10K+ reads/sec/node | Node-based | Wide rows, GCP-native, large datasets |
| Cassandra | 2-10ms | High (tunable) | Node-based | Multi-region, open-source, write-heavy |
| SQLite (Feast) | <1ms (local) | Single process | Free | Development, testing, edge deployment |
Redis-Based Online Store Implementation
Python
# Production Redis online store with feature encoding import redis import struct import hashlib from typing import Dict, List, Optional class RedisOnlineStore: """Production-grade Redis online feature store.""" def __init__(self, host: str, port: int = 6379, cluster_mode: bool = True): if cluster_mode: self.client = redis.RedisCluster( host=host, port=port, decode_responses=False, socket_timeout=0.1, # 100ms timeout socket_connect_timeout=0.5, # 500ms connect timeout retry_on_timeout=True, max_connections=50, ) else: self.client = redis.Redis( host=host, port=port, decode_responses=False, ) def _feature_key(self, feature_view: str, entity_key: str) -> str: """Generate Redis key: feature_view:entity_hash""" entity_hash = hashlib.md5(entity_key.encode()).hexdigest()[:12] return f"fs:{feature_view}:{entity_hash}" def write_features(self, feature_view: str, entity_key: str, features: Dict[str, float], ttl_seconds: int = 86400): """Write feature values to Redis with TTL.""" key = self._feature_key(feature_view, entity_key) # Use Redis hash for efficient multi-feature storage pipe = self.client.pipeline() # Pack floats as binary for 50% memory savings vs strings encoded = {k: struct.pack("d", v) for k, v in features.items()} pipe.hset(key, mapping=encoded) pipe.expire(key, ttl_seconds) pipe.execute() def read_features(self, feature_view: str, entity_key: str, feature_names: List[str]) -> Dict[str, Optional[float]]: """Read feature values from Redis. Returns None for missing features.""" key = self._feature_key(feature_view, entity_key) values = self.client.hmget(key, feature_names) result = {} for name, val in zip(feature_names, values): if val is not None: result[name] = struct.unpack("d", val)[0] else: result[name] = None # Feature not found return result def batch_read(self, feature_view: str, entity_keys: List[str], feature_names: List[str]) -> List[Dict]: """Batch read using Redis pipeline for minimal round trips.""" pipe = self.client.pipeline() for entity_key in entity_keys: key = self._feature_key(feature_view, entity_key) pipe.hmget(key, feature_names) results = pipe.execute() return [ {name: struct.unpack("d", val)[0] if val else None for name, val in zip(feature_names, row)} for row in results ] # Usage store = RedisOnlineStore("redis-cluster.prod.internal") features = store.read_features( "user_spending_stats", "user_12345", ["avg_transaction_amount", "transaction_count_30d", "total_spend_90d"], ) # Typical latency: 0.3-0.8ms for same-AZ Redis Cluster
Materialization Pipeline
Materialization moves features from the offline store to the online store. This is the bridge between batch-computed features and real-time serving.
Python
# Feast materialization: offline store -> online store from feast import FeatureStore from datetime import datetime, timedelta store = FeatureStore(repo_path="feature_repo/") # Materialize features computed in the last 24 hours store.materialize_incremental( end_date=datetime.utcnow(), ) # This reads new feature rows from the offline store # and writes the latest value per entity to the online store # For initial backfill or recovery: store.materialize( start_date=datetime.utcnow() - timedelta(days=7), end_date=datetime.utcnow(), feature_views=["user_spending_stats"], # Specific views only ) # Production: Run materialization on a schedule via Airflow # @daily for batch features, @hourly for near-real-time
DynamoDB Online Store with Feast
YAML
# feature_store.yaml - Feast configuration with DynamoDB online store project: fraud_detection registry: s3://feast-registry/registry.db provider: aws online_store: type: dynamodb region: us-east-1 table_name_prefix: feast_ offline_store: type: redshift cluster_id: feature-store-cluster region: us-east-1 database: features user: feast_user s3_staging_location: s3://feast-staging/redshift/ # DynamoDB auto-scales based on read/write demand # Provisioned capacity: 1000 RCU, 500 WCU (adjust per workload) # On-demand mode: better for unpredictable traffic patterns
Cache Strategies for Feature Serving
| Strategy | Description | When to Use |
|---|---|---|
| Write-through | Materialization writes to both offline and online store simultaneously | Strong consistency requirements |
| Write-behind | Online store updated asynchronously after offline write | High write throughput, eventual consistency OK |
| Application-level cache | In-process LRU cache in the ML serving layer | Repeated entity lookups within a request batch |
| Precomputed feature vectors | Store entire feature vectors as a single blob per entity | Fixed feature sets, minimal per-request computation |
Python
# Application-level feature cache for ML serving from functools import lru_cache import time class CachedFeatureClient: """Wraps feature store with TTL-based in-memory cache.""" def __init__(self, store, cache_ttl_seconds=60): self.store = store self.cache_ttl = cache_ttl_seconds self._cache = {} self._cache_timestamps = {} def get_features(self, entity_key: str, feature_names: list) -> dict: cache_key = f"{entity_key}:{','.join(sorted(feature_names))}" now = time.monotonic() # Check cache freshness if (cache_key in self._cache and now - self._cache_timestamps[cache_key] < self.cache_ttl): return self._cache[cache_key] # Cache miss: fetch from online store features = self.store.get_online_features( features=feature_names, entity_rows=[{"user_id": entity_key}], ).to_dict() self._cache[cache_key] = features self._cache_timestamps[cache_key] = now return features # Use short TTL (30-60s) for features that change frequently # Use longer TTL (5-15 min) for slowly-changing features # NEVER cache features that must reflect real-time state
Memory Sizing: For Redis-based online stores, estimate memory as:
num_entities * num_features * 16 bytes * 1.5 (overhead). For 10M users with 50 features each, that is roughly 12 GB. Redis Cluster can shard this across nodes, but DynamoDB may be more cost-effective above 50 GB.
Ready for Real-Time Features?
The next lesson covers streaming feature computation with Kafka and Flink for features that need sub-second freshness.
Next: Real-Time Feature Engineering →
Lilly Tech Systems