Components of ML Systems
A comprehensive guide to components of ml systems within the context of ai architecture fundamentals.
The Components of Production ML Systems
A production machine learning system is composed of many interacting components, each with specific responsibilities. Understanding these components and their interactions is essential for designing systems that are reliable, maintainable, and scalable. This lesson maps out the complete landscape of ML system components.
High-Level Component Map
At the highest level, an ML system consists of five major subsystems:
- Data management — Everything related to acquiring, storing, validating, and versioning data
- Feature engineering — Transforming raw data into features suitable for model consumption
- Model development — Training, evaluating, tuning, and selecting models
- Model deployment — Getting models into production and serving predictions
- Operations and monitoring — Keeping the system running and detecting problems
Data Management Components
Data management forms the foundation of any ML system. Without reliable, high-quality data, even the best model architecture will produce poor results.
Data Ingestion
The data ingestion layer handles acquiring data from various sources: databases, APIs, file systems, streaming platforms, and third-party providers. It must handle different formats (JSON, CSV, Parquet, Avro), different delivery mechanisms (push vs. pull), and different update frequencies (real-time, hourly, daily).
# Example: Data ingestion with validation
from great_expectations import DataContext
class DataIngestionPipeline:
def __init__(self, source_config):
self.source = self._create_source(source_config)
self.validator = DataContext()
def ingest(self):
raw_data = self.source.read()
# Validate before proceeding
validation_result = self.validator.run_checkpoint(
checkpoint_name="data_quality",
batch_request={"data": raw_data}
)
if not validation_result.success:
raise DataQualityError(validation_result)
return raw_data
Data Storage
ML systems typically use multiple storage systems optimized for different access patterns:
- Object storage (S3, GCS, ADLS) — Raw data, training datasets, model artifacts
- Data warehouses (BigQuery, Redshift, Snowflake) — Structured analytical queries, reporting
- Data lakes (Delta Lake, Iceberg, Hudi) — Large-scale analytics with ACID transactions
- Feature stores (Feast, Tecton) — Low-latency feature serving for real-time inference
- Vector databases (Pinecone, Weaviate, Milvus) — Embedding similarity search for RAG systems
Feature Engineering Components
Feature engineering transforms raw data into the numerical representations that models consume. This layer is critical because features directly determine what patterns the model can learn.
Feature Computation
Feature computation can be batch (processing large datasets periodically) or real-time (computing features on-demand for each prediction request). Most production systems use a combination of both approaches, often mediated by a feature store.
Feature Store
A feature store serves as the central repository for features, providing consistent feature values for both training and serving. It typically has two components: an offline store for batch access during training and an online store for low-latency access during inference.
# Feature store usage pattern
from feast import FeatureStore
store = FeatureStore(repo_path="./feature_repo")
# Training: get historical features (offline store)
training_df = store.get_historical_features(
entity_df=entity_dataframe,
features=["user_features:purchase_count_30d",
"user_features:avg_session_duration"]
).to_df()
# Serving: get real-time features (online store)
features = store.get_online_features(
features=["user_features:purchase_count_30d"],
entity_rows=[{"user_id": "12345"}]
).to_dict()
Model Development Components
Model development requires infrastructure for training at scale, tracking experiments, tuning hyperparameters, and evaluating model quality.
Experiment Tracking
Experiment tracking systems (MLflow, Weights & Biases, Neptune) record every training run with its hyperparameters, metrics, artifacts, and code version. This enables reproducibility and makes it possible to compare runs systematically.
Model Registry
The model registry stores trained model artifacts along with metadata including training data version, performance metrics, approval status, and deployment history. It serves as the single source of truth for which models exist and which are approved for production use.
Deployment Components
Model deployment components handle getting trained models into production and serving predictions reliably.
- Model serving framework — TensorFlow Serving, TorchServe, Triton, or custom REST/gRPC APIs
- Container orchestration — Kubernetes with GPU support for scaling model servers
- API gateway — Request routing, rate limiting, authentication, and model versioning
- Load balancer — Distributing inference requests across model replicas
Operations and Monitoring
ML operations (MLOps) components ensure the system remains healthy and performant over time.
- Data drift detection — Monitoring for changes in input data distributions
- Model performance tracking — Tracking prediction accuracy against ground truth labels
- System metrics — Latency, throughput, error rates, resource utilization
- Alerting — Automated notifications when metrics breach thresholds
- Logging — Structured logs for debugging and auditing prediction decisions
Understanding these components gives you the vocabulary and mental model needed for the rest of this course. In the next lesson, we will learn how to document architecture decisions using Architecture Decision Records.
Lilly Tech Systems