Intermediate

Pipeline Components

Create reusable pipeline components, leverage pre-built Google Cloud components, and package custom components as container images for production use.

Lightweight Python Components

The simplest components are Python functions with the @dsl.component decorator:

@dsl.component(base_image="python:3.11", packages_to_install=["requests"])
def fetch_data(url: str, output: dsl.OutputPath("Dataset")):
    import requests
    response = requests.get(url)
    with open(output, "w") as f:
        f.write(response.text)

Container Components

For complex components with custom dependencies, build a dedicated Docker image:

@dsl.container_component
def gpu_training():
    return dsl.ContainerSpec(
        image="my-registry/gpu-trainer:v1.0",
        command=["python", "train.py"],
        args=["--epochs", "100", "--batch-size", "32"]
    )

Pre-Built Components

Google Cloud provides pre-built components for common ML tasks:

from google_cloud_pipeline_components.v1.dataset import TabularDatasetCreateOp
from google_cloud_pipeline_components.v1.automl.training_job import AutoMLTabularTrainingJobRunOp

@dsl.pipeline(name="vertex-pipeline")
def vertex_training():
    dataset = TabularDatasetCreateOp(
        display_name="my-dataset",
        bq_source="bq://project.dataset.table"
    )
    training = AutoMLTabularTrainingJobRunOp(
        display_name="my-model",
        dataset=dataset.outputs["dataset"]
    )

Component Best Practices

Single responsibility: Each component should do one thing well (load data, train, evaluate, deploy).
Type annotations: Always use typed inputs and outputs for validation and artifact tracking.
Versioned images: Pin container image versions to ensure reproducibility across runs.
Small base images: Use minimal base images to reduce component startup time and storage costs.

✅

Sharing components: Package reusable components as Python packages and publish them to your organization's package registry. This enables consistent ML operations across teams and projects.

← Previous Pipeline SDK Next → Experiments