Intermediate

Architecture Decision Records

A comprehensive guide to architecture decision records within the context of ai architecture fundamentals.

Architecture Decision Records for AI

Architecture Decision Records (ADRs) are lightweight documents that capture important architecture decisions along with their context and consequences. In AI projects, where decisions about data pipelines, model architectures, serving strategies, and infrastructure choices have long-lasting impacts, ADRs provide essential documentation that helps current and future team members understand why the system is designed the way it is.

Without ADRs, teams repeatedly revisit the same decisions because no one remembers why a particular approach was chosen. New team members make changes that inadvertently break assumptions the architecture depends on. And when things go wrong, there is no record of the trade-offs that were considered.

ADR Structure

A good ADR follows a simple, consistent structure:

# ADR-001: Use Feature Store for Training-Serving Consistency

## Status
Accepted (2026-01-15)

## Context
Our current system computes features differently during training
(batch Spark jobs) and serving (real-time Python code). This has
caused training-serving skew in 3 incidents over the past quarter,
each requiring 2-3 days to diagnose and fix.

## Decision
We will adopt Feast as our feature store. Features will be defined
once and served from both the offline store (for training) and
online store (for inference).

## Consequences
- Positive: Eliminates training-serving skew
- Positive: Feature reuse across teams
- Negative: Additional infrastructure to maintain
- Negative: Learning curve for data scientists
- Risk: Feast may not scale to our 10M entity requirement

💡

Keep ADRs short: An ADR should be one to two pages maximum. If it is longer, you are probably combining multiple decisions into one document. Split them up.

When to Write an ADR

Not every decision needs an ADR. Write one when:

The decision is hard to reverse (choosing a cloud provider, selecting a database, designing a data schema)
The decision affects multiple teams or components
There were multiple viable options and the choice was not obvious
The decision involves significant trade-offs in cost, performance, or complexity
Team members have disagreed about the best approach

Common AI Architecture Decisions That Need ADRs

Choosing between batch and real-time inference
Selecting a model serving framework
Deciding on a feature store vs. custom feature pipelines
Choosing between managed services and self-hosted infrastructure
Selecting a GPU instance type and scaling strategy
Deciding on model update frequency and retraining triggers
Choosing a data storage format (Parquet, Delta, Iceberg)

ADR Lifecycle

ADRs have a lifecycle that reflects the decision's current status:

Proposed — The decision is under discussion
Accepted — The team has agreed on the decision
Deprecated — The decision has been superseded by a newer ADR
Superseded — A new ADR explicitly replaces this one (link to the new ADR)

AI-Specific ADR Considerations

AI systems introduce unique aspects that ADRs should capture:

# ADR-007: Model Retraining Strategy

## Context
Our recommendation model's click-through rate degrades by ~2%
per week due to shifting user preferences. We need an automated
retraining strategy.

## Options Considered
1. Daily scheduled retraining (simple but wasteful)
2. Performance-triggered retraining (efficient but complex)
3. Online learning with periodic full retrain (best quality)

## Decision
Option 2: Retrain when CTR drops below 95% of baseline.
Monitor CTR with 1-hour sliding window.

## ML-Specific Consequences
- Data dependency: Requires labeled data within 24 hours
- Compute cost: Each retrain costs ~$50 in GPU time
- Rollback plan: Keep last 3 model versions, auto-rollback
  if new model performs worse than current on holdout set

Storing and Organizing ADRs

Store ADRs in version control alongside the code they describe. A common structure is:

docs/adr/ directory in the repository root
Numbered sequentially: adr-001-feature-store.md, adr-002-model-serving.md
Include an index file that lists all ADRs with their status and a one-line summary

⚠

Do not delete old ADRs. Even deprecated decisions provide valuable context. Mark them as superseded and link to the replacement. Future team members will benefit from understanding the evolution of the architecture.

Tools for Managing ADRs

Several tools simplify ADR management: adr-tools (command-line tool for creating and managing ADRs), Log4brains (web-based ADR viewer), and GitHub/GitLab wiki pages. For most teams, simple Markdown files in a git repository are sufficient. The important thing is consistency and discoverability, not tooling sophistication.

In the next lesson, we will apply these principles to one of the most critical architecture concerns: scalability.

← PreviousComponents of ML Systems Next →Scalability Considerations