Introduction to Cloud AI Security
Cloud platforms are the primary infrastructure for AI workloads. Understanding the shared responsibility model, unique attack surfaces, and security controls for cloud AI services is essential.
The Shared Responsibility Model for AI
Every major cloud provider operates under a shared responsibility model. For AI workloads, responsibilities shift depending on the service tier:
| Service Tier | Cloud Provider Responsibility | Customer Responsibility |
|---|---|---|
| IaaS (GPU VMs) | Physical infrastructure, hypervisor, network | OS, runtime, ML frameworks, data, models, access control |
| PaaS (SageMaker, Vertex AI) | Infrastructure, OS, runtime, ML platform | Data, models, access control, endpoint configuration |
| SaaS (Bedrock, AI APIs) | Everything except customer data and access | Data sent to APIs, access control, usage policies |
Unique Risks of Cloud AI
- Data exposure through AI APIs: Training data, prompts, and model outputs may be logged, cached, or used for service improvement unless explicitly opted out
- Model endpoint exposure: Misconfigured inference endpoints can be accessed by unauthorized users, enabling model theft or abuse
- Cross-tenant risks: Shared GPU infrastructure may expose side-channel attacks between tenants on multi-tenant AI platforms
- Cost-based attacks: Adversaries can trigger expensive GPU training or inference jobs through compromised credentials, leading to massive bills
- Data residency violations: AI services may process data in regions that violate regulatory requirements without explicit region configuration
Core Security Pillars
Identity & Access Management
Least-privilege IAM policies for ML services, service accounts for pipelines, temporary credentials for training jobs, and cross-service permission boundaries.
Network Security
VPC endpoints for AI services, private link connections, network segmentation for training and inference, and firewall rules for ML API endpoints.
Data Protection
Encryption at rest for training data and model artifacts, encryption in transit for all AI API calls, and key management with customer-managed keys.
Monitoring & Audit
Comprehensive logging of all ML operations, real-time alerting on suspicious activity, and audit trails for compliance and forensic investigation.
Lilly Tech Systems