Beginner

Why Unit Test ML Code

The case for unit testing in machine learning projects. Part of the Unit Testing for ML Pipelines course at AI School by Lilly Tech Systems.

The Testing Gap in Machine Learning

Most machine learning code is written in notebooks without a single test. Data scientists focus on model accuracy and treat the surrounding code as disposable glue. This approach works for experimentation but creates serious problems when ML systems move to production. Bugs in data preprocessing, feature engineering, and pipeline orchestration are the leading cause of production ML failures, and they are exactly the kind of bugs that unit tests catch.

Unit testing ML code is not about testing whether your model is accurate. That is the job of model evaluation. Unit testing verifies that the individual functions and components in your ML pipeline work correctly: data transformations produce the expected output, feature engineering logic handles edge cases, and training utilities behave as specified.

What to Unit Test in ML Pipelines

Not everything in an ML project needs unit tests. Focus your testing effort on code that is deterministic and can be verified with exact assertions:

  • Data loading and parsing — Verify that your data loaders correctly read CSV, JSON, Parquet, and other formats
  • Data cleaning functions — Test null handling, type conversions, outlier removal, and deduplication
  • Feature engineering — Verify that each feature transformation produces the correct output for known inputs
  • Data validation — Test schema validation, range checks, and constraint enforcement
  • Utility functions — Test helper functions for metrics calculation, data splitting, and configuration parsing
  • Pipeline orchestration — Test that pipeline steps execute in the correct order with proper data flow

What NOT to Unit Test

Some aspects of ML do not belong in unit tests:

  • Model accuracy (use model evaluation and integration tests instead)
  • Training convergence (use training monitoring and smoke tests)
  • Hyperparameter choices (use experiment tracking and cross-validation)
💡
Principle: If a function has deterministic behavior for a given input, it should have a unit test. If the behavior is stochastic, test the properties of the output (shape, type, range) rather than exact values.

The ROI of ML Unit Tests

Teams that adopt unit testing for their ML code report several concrete benefits:

  1. Faster debugging — When a pipeline fails, unit tests isolate the failing component immediately instead of hunting through logs
  2. Safer refactoring — You can confidently restructure code knowing that tests will catch regressions
  3. Better collaboration — Tests serve as documentation for how functions should behave, making it easier for new team members to understand the codebase
  4. Fewer production incidents — Catching data processing bugs before deployment prevents costly production failures
# Example: Testing a data cleaning function
import pandas as pd
import pytest

def clean_user_data(df):
    # Clean raw user data for model training.
    df = df.copy()
    df['age'] = df['age'].clip(0, 120)
    df['email'] = df['email'].str.lower().str.strip()
    df = df.dropna(subset=['user_id'])
    df['signup_date'] = pd.to_datetime(df['signup_date'], errors='coerce')
    return df

def test_clean_user_data_clips_age():
    df = pd.DataFrame({'user_id': [1, 2], 'age': [-5, 200],
                        'email': ['A@B.com', 'c@d.com'],
                        'signup_date': ['2024-01-01', '2024-01-02']})
    result = clean_user_data(df)
    assert result['age'].min() >= 0
    assert result['age'].max() <= 120

def test_clean_user_data_lowercases_email():
    df = pd.DataFrame({'user_id': [1], 'age': [25],
                        'email': ['  Test@Example.COM  '],
                        'signup_date': ['2024-01-01']})
    result = clean_user_data(df)
    assert result['email'].iloc[0] == 'test@example.com'

def test_clean_user_data_drops_null_user_id():
    df = pd.DataFrame({'user_id': [1, None], 'age': [25, 30],
                        'email': ['a@b.com', 'c@d.com'],
                        'signup_date': ['2024-01-01', '2024-01-02']})
    result = clean_user_data(df)
    assert len(result) == 1

Overcoming Common Objections

Data scientists often resist unit testing with arguments like "ML is experimental" or "tests slow me down." The reality is that unit tests make experimentation faster by catching bugs early. The time spent writing tests is repaid many times over in reduced debugging time. Start small: add tests for the functions that have caused production issues in the past, then expand coverage gradually.

Warning: If your ML code has no unit tests and works fine, that does not mean it is correct. It means you have not found the bugs yet. Production ML systems without tests accumulate hidden bugs in data processing logic that eventually surface as mysterious model degradation or silent data corruption.