DocsConcepts and ExpressionsTask RunnersTask Runner TypesGoogle Batch Task Runner

Google Batch Task Runner

Available on: >= 0.16.0

Run tasks as containers on Google Cloud VMs.

This runner will deploy the container for the task to a specified Google Cloud Batch VM.

To launch the task on Google Cloud Batch, there are three main concepts you need to be aware of:

Machine Type — a mandatory property indicating the compute machine type to which the task will be deployed. If no reservation is specified, a new compute instance will be created for each batch which can add around a minute of latency.
Reservation — an optional property allowing you to set up a reservation for a given virtual machine and avoid the time needed to create a new compute instance for each task.
Network Interfaces — an optional property; if not specified, it will use the default interface if not specified otherwise.

In order to support inputFiles, namespaceFiles, and outputFiles, the Google Batch task runner will do the following:

mount a volume from a GCS bucket
upload input files to the bucket before launching the container
download output files from the bucket after the container has finished running.

Since we don't know the working directory of the container in advance, you need to explicitly define the working directory and output directory when using the GCP Batch runner, e.g. use python {{workingDir}}/main.py rather than python main.py.

yaml

id: gcp_batch_runner
namespace: company.team

tasks:
  - id: scrape_environment_info
    type: io.kestra.plugin.scripts.python.Commands
    containerImage: ghcr.io/kestra-io/pydata:latest
    taskRunner:
      type: io.kestra.plugin.gcp.runner.Batch
      projectId: "{{ secrets.projectId }}"
      region: europe-west9
    commands:
      - python {{ workingDir }}/main.py
    namespaceFiles:
      enabled: true
    outputFiles:
      - "environment_info.json"
    inputFiles:
      main.py: |
        import platform
        import socket
        import sys
        import json
        from kestra import Kestra

        print("Hello from GCP Batch and kestra!")

        def print_environment_info():
            print(f"Host's network name: {platform.node()}")
            print(f"Python version: {platform.python_version()}")
            print(f"Platform information (instance type): {platform.platform()}")
            print(f"OS/Arch: {sys.platform}/{platform.machine()}")

            env_info = {
                "host": platform.node(),
                "platform": platform.platform(),
                "OS": sys.platform,
                "python_version": platform.python_version(),
            }
            Kestra.outputs(env_info)

            filename = '{{ workingDir }}/environment_info.json'
            with open(filename, 'w') as json_file:
                json.dump(env_info, json_file, indent=4)

        if __name__ == '__main__':
          print_environment_info()

For a full list of properties available in the Google Batch task runner, check the GCP plugin documentation or explore the same in the built-in Code Editor in the Kestra UI.

Was this page helpful?

TypesAzure Batch Task Runner

TypesGoogle Cloud Run Task Runner

​Google ​Batch ​Task ​Runner

Google Batch Task Runner