Intermediate

AWS AI Security

AWS offers the broadest set of AI/ML services. Securing SageMaker, Bedrock, and other AWS AI services requires understanding IAM policies, VPC configuration, encryption, and audit logging.

SageMaker Security Hardening

  1. VPC-Only Mode

    Deploy SageMaker notebooks, training jobs, and endpoints within a VPC. Disable direct internet access and route all traffic through VPC endpoints or NAT gateways. This prevents data exfiltration and unauthorized API access.

  2. Execution Role Scoping

    Create dedicated IAM execution roles for each SageMaker component. Training roles need S3 access to training data. Endpoint roles need only model artifact access. Never reuse a single broad role across all components.

  3. Network Isolation

    Enable network isolation for training jobs and processing jobs. This prevents containers from making outbound network calls, eliminating the risk of data exfiltration during training.

  4. Notebook Security

    Disable root access on SageMaker notebooks. Use lifecycle configurations to enforce security policies. Enable encryption for notebook storage volumes using KMS customer-managed keys.

IAM for AWS AI Services

Service Key IAM Actions to Restrict Recommended Boundary
SageMaker CreateEndpoint, CreateTrainingJob, CreateNotebookInstance Permission boundary per team/project
Bedrock InvokeModel, CreateModelCustomizationJob Model-specific policies, deny access to restricted models
Comprehend DetectPiiEntities, StartEntitiesDetectionJob Data classification-based access
Rekognition DetectFaces, SearchFacesByImage Restrict facial recognition to approved use cases

Encryption for AWS AI

  • S3 bucket encryption: Enable SSE-KMS with customer-managed keys for all buckets containing training data, model artifacts, and pipeline outputs
  • EBS volume encryption: Encrypt all SageMaker instance storage volumes. Use KMS keys with key policies that restrict access to authorized roles only
  • Inter-container encryption: Enable inter-container traffic encryption for distributed training jobs to protect gradient data in transit
  • Endpoint encryption: All SageMaker endpoints use TLS by default. Enforce TLS 1.2 minimum and configure custom certificates for internal endpoints
  • Bedrock data encryption: Enable customer-managed KMS keys for Bedrock model customization data and ensure prompts are not logged by AWS

CloudTrail for ML Operations

Audit Gap: CloudTrail logs SageMaker API calls but does not capture the contents of training data or model predictions. For content-level auditing, implement custom logging within your ML application code and send logs to CloudWatch or S3.

API Call Logging

Enable CloudTrail for all SageMaker and Bedrock API calls. Monitor for suspicious patterns like bulk endpoint creation, unusual model downloads, or off-hours training jobs.

Data Access Logging

Enable S3 server access logging and CloudTrail data events for buckets containing ML data. Track who accessed training data and model artifacts.

Cost Anomaly Detection

Configure AWS Cost Anomaly Detection with alerts for ML services. A sudden spike in GPU instance usage may indicate compromised credentials being used for crypto mining or unauthorized training.

GuardDuty Integration

Enable GuardDuty for threat detection across your ML infrastructure. GuardDuty can detect unusual API calls, compromised credentials, and data exfiltration patterns.

💡
Next Up: In the next lesson, we explore GCP AI Security — Vertex AI security controls, VPC Service Controls, CMEK encryption, and Cloud Audit Logs for ML workloads.