Intermediate

Sandbox and Isolation Strategies

The most effective way to prevent AI agents from causing infrastructure damage is to ensure they never touch real infrastructure in the first place. This lesson covers sandboxing techniques from Docker containers to full cloud emulators.

Why Sandboxing Matters

Even with dry-run patterns and permission models, there's always a risk that an AI agent will find a way to execute a destructive command. Sandboxing provides a structural guarantee that the agent physically cannot reach production resources:

💡

Defense in depth: Sandboxing is your safety net when all other guardrails fail. Permissions can be misconfigured, dry-run checks can be bypassed, but a sandboxed environment with no network access to production simply cannot cause production damage.

Docker Containers for Agent Execution

Running your AI agent inside a Docker container provides process isolation, filesystem isolation, and network control:

Dockerfile - Agent Sandbox Environment

# Dockerfile.agent-sandbox
FROM ubuntu:24.04

# Install development tools
RUN apt-get update && apt-get install -y \
    git curl python3 python3-pip nodejs npm \
    terraform kubectl helm \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user for agent
RUN useradd -m -s /bin/bash agent
USER agent
WORKDIR /home/agent/workspace

# Copy project files (mounted at runtime)
# No cloud credentials are baked into the image

CMD ["/bin/bash"]

YAML - Docker Compose for Agent Sandbox

# docker-compose.agent-sandbox.yml
version: '3.8'

services:
  agent-sandbox:
    build:
      context: .
      dockerfile: Dockerfile.agent-sandbox
    volumes:
      - ./project:/home/agent/workspace:rw
      # DO NOT mount ~/.aws, ~/.kube, or other credential dirs
    networks:
      - sandbox-net
    mem_limit: 4g
    cpus: 2
    security_opt:
      - no-new-privileges:true
    read_only: true
    tmpfs:
      - /tmp:rw,size=500m
      - /home/agent/.cache:rw,size=200m

  # Local services for testing
  localstack:
    image: localstack/localstack:latest
    ports:
      - "4566:4566"
    environment:
      - SERVICES=s3,sqs,dynamodb,lambda,iam,ec2
    networks:
      - sandbox-net

  azurite:
    image: mcr.microsoft.com/azure-storage/azurite:latest
    ports:
      - "10000:10000"
      - "10001:10001"
      - "10002:10002"
    networks:
      - sandbox-net

networks:
  sandbox-net:
    driver: bridge
    internal: true  # No external internet access

Dedicated Cloud Accounts for Agent Testing

For testing that requires real cloud APIs, create completely separate accounts/projects:

Cloud	Isolation Strategy	Billing Protection
AWS	Separate AWS account in an Organization OU with SCPs	Budget alerts + hard limits via SCPs
Azure	Separate subscription with spending cap	Budget alerts + spending limits
GCP	Separate project with billing budget	Budget alerts + billing cap

JSON - AWS SCP: Restrict Agent Account

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyExpensiveServices",
      "Effect": "Deny",
      "Action": [
        "redshift:*",
        "sagemaker:CreateNotebookInstance",
        "ec2:RunInstances"
      ],
      "Resource": "*",
      "Condition": {
        "ForAnyValue:StringNotLike": {
          "ec2:InstanceType": ["t3.micro", "t3.small"]
        }
      }
    },
    {
      "Sid": "DenyRegionsOutsideUS",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": ["us-east-1", "us-west-2"]
        }
      }
    }
  ]
}

LocalStack for AWS Testing

LocalStack emulates AWS services locally, letting AI agents interact with S3, DynamoDB, Lambda, EC2, and more without touching real AWS:

Bash - Using LocalStack with Agent

# Start LocalStack
docker run -d --name localstack -p 4566:4566 localstack/localstack

# Configure AWS CLI to point to LocalStack
export AWS_ENDPOINT_URL=http://localhost:4566
export AWS_ACCESS_KEY_ID=test
export AWS_SECRET_ACCESS_KEY=test
export AWS_DEFAULT_REGION=us-east-1

# Now the agent's AWS commands go to LocalStack, not real AWS
aws s3 mb s3://my-test-bucket
aws dynamodb create-table --table-name Users \
  --attribute-definitions AttributeName=id,AttributeType=S \
  --key-schema AttributeName=id,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

# Even destructive commands are safe!
aws s3 rb s3://my-test-bucket --force  # Only affects LocalStack

Azurite for Azure Local Testing

Azurite provides local emulation for Azure Blob Storage, Queue Storage, and Table Storage:

Bash - Azurite Setup

# Install and run Azurite
npm install -g azurite
azurite --location ./azurite-data --debug ./azurite-debug.log

# Or use Docker
docker run -d --name azurite -p 10000:10000 -p 10001:10001 -p 10002:10002 \
  mcr.microsoft.com/azure-storage/azurite

# Configure connection string for local Azurite
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1;"

# Agent can now use Azure Storage commands safely
az storage container create --name test-container
az storage blob upload --container-name test-container --file ./data.json --name data.json

GCP Emulators

Google Cloud provides official emulators for several services:

Bash - GCP Emulator Setup

# Pub/Sub emulator
gcloud beta emulators pubsub start --project=test-project

# Datastore emulator
gcloud beta emulators datastore start --project=test-project

# Bigtable emulator
gcloud beta emulators bigtable start

# Firestore emulator
gcloud beta emulators firestore start --project=test-project

# Set environment variables to point to emulators
$(gcloud beta emulators pubsub env-init)
$(gcloud beta emulators datastore env-init)

Feature Branches + Ephemeral Environments

Combine git feature branches with ephemeral cloud environments so agents work in isolated copies:

YAML - GitHub Actions: Ephemeral Environment per PR

name: Ephemeral Environment

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  deploy-ephemeral:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Create ephemeral namespace
        run: |
          NAMESPACE="pr-${{ github.event.pull_request.number }}"
          kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -
          helm upgrade --install app-$NAMESPACE ./chart \
            --namespace $NAMESPACE \
            --set image.tag=${{ github.sha }} \
            --set env=ephemeral

      - name: Run agent tests
        run: |
          NAMESPACE="pr-${{ github.event.pull_request.number }}"
          # Agent's changes are tested in isolated namespace
          kubectl -n $NAMESPACE run tests --image=test-runner --rm -it

  cleanup-ephemeral:
    runs-on: ubuntu-latest
    if: github.event.action == 'closed'
    steps:
      - name: Delete ephemeral namespace
        run: |
          NAMESPACE="pr-${{ github.event.pull_request.number }}"
          kubectl delete namespace $NAMESPACE

GitOps: Agents Propose, Humans Approve

The safest workflow for AI agents and infrastructure is GitOps: agents make changes in git, humans review and approve, and the CI/CD pipeline applies:

Agent Creates a Branch
The AI agent creates a feature branch and commits its infrastructure changes (Terraform files, Kubernetes manifests, Helm values).
Agent Opens a Pull Request
The agent opens a PR with a description of the changes, including the plan output.
Automated Checks Run
CI runs terraform plan, security scanning, policy checks, and cost estimation on the PR.
Human Reviews and Approves
A human reviews the plan output, the code changes, and the automated check results.
Pipeline Applies on Merge
Only after merge does the CI/CD pipeline run terraform apply. The agent never directly applies.

✅

GitOps is the gold standard: When AI agents interact with infrastructure through GitOps, they can never directly cause damage. The worst they can do is create a bad PR, which a human can simply reject. This is covered in more depth in Lesson 6: CI/CD Safety.

Key Takeaways

Sandboxing provides structural safety that doesn't depend on the agent's behavior
Use Docker with --network none for complete isolation from cloud services
LocalStack, Azurite, and GCP emulators let agents test cloud operations locally
Create dedicated cloud accounts with SCPs/budgets for agent testing that needs real APIs
Ephemeral environments per PR give agents isolated playgrounds
GitOps is the ultimate safety pattern: agents propose, humans approve, pipelines apply

← Previous Dry-Run Patterns Next → Guardrail Scripts

Sandbox and Isolation Strategies

Why Sandboxing Matters

Docker Containers for Agent Execution

Dedicated Cloud Accounts for Agent Testing

LocalStack for AWS Testing

Azurite for Azure Local Testing

GCP Emulators

Feature Branches + Ephemeral Environments

GitOps: Agents Propose, Humans Approve

Agent Creates a Branch

Agent Opens a Pull Request

Automated Checks Run

Human Reviews and Approves

Pipeline Applies on Merge

Key Takeaways