Beginner

Introduction to AI Supply Chain Security

The AI supply chain encompasses every component that goes into building, training, and deploying machine learning systems. A single compromised link can undermine the entire pipeline.

What is the AI Supply Chain?

The AI supply chain refers to the complete set of components, processes, and dependencies involved in creating and deploying AI systems. This includes training data, pre-trained models, ML frameworks, libraries, hardware, and the infrastructure used to serve models in production.

⚠

Growing Attack Surface: As organizations increasingly rely on pre-trained models, third-party datasets, and open-source ML libraries, the attack surface of the AI supply chain has expanded dramatically. A 2024 study found that over 70% of ML projects use at least one component with known vulnerabilities.

Components of the AI Supply Chain

Understanding the full scope of the supply chain is the first step toward securing it:

Component	Examples	Risk Level
Pre-trained Models	Hugging Face models, OpenAI APIs, model zoos	Critical
Training Data	Public datasets, scraped data, synthetic data	Critical
ML Frameworks	PyTorch, TensorFlow, JAX, scikit-learn	High
Dependencies	Python packages, CUDA libraries, container images	High
Infrastructure	Cloud GPU instances, model registries, CI/CD pipelines	Medium

Real-World Supply Chain Attacks

Poisoned Models on Hugging Face (2024)

Researchers discovered over 100 models on the Hugging Face Hub containing hidden backdoors or malicious code embedded in model serialization formats like Pickle. These models could execute arbitrary code when loaded.
PyTorch Nightly Dependency Compromise

The torchtriton package on PyPI was compromised through dependency confusion, allowing attackers to harvest system information from developers who installed the nightly build of PyTorch.
Dataset Poisoning in Common Crawl

Researchers demonstrated that adversaries could purchase expired domains in Common Crawl and inject poisoned content that would be included in future training datasets used by major language models.
Malicious Jupyter Notebook Extensions

Trojanized Jupyter extensions were found on package registries that could silently exfiltrate notebook contents, including proprietary training code and API keys.

Why AI Supply Chain Security Matters

Trust Assumptions

Most ML practitioners implicitly trust models and datasets downloaded from popular repositories without verifying their integrity or provenance.

Cascading Impact

A compromised base model can affect every downstream application built on top of it, potentially impacting millions of users.

Detection Difficulty

Backdoored models can perform normally on standard benchmarks while containing hidden behaviors triggered by specific inputs.

Regulatory Pressure

The EU AI Act and similar regulations are beginning to require supply chain transparency and documentation for high-risk AI systems.

💡

Looking Ahead: In the next lesson, we will dive deep into the model supply chain — exploring model provenance, the risks of pre-trained models, and how to verify model integrity before deployment.

← Previous Course Overview Next → Model Supply Chain

Introduction to AI Supply Chain Security

What is the AI Supply Chain?

Components of the AI Supply Chain

Real-World Supply Chain Attacks

Poisoned Models on Hugging Face (2024)

PyTorch Nightly Dependency Compromise

Dataset Poisoning in Common Crawl

Malicious Jupyter Notebook Extensions

Why AI Supply Chain Security Matters

Trust Assumptions

Cascading Impact

Detection Difficulty

Regulatory Pressure