Batch

Introduction

  • Run batch jobs as Docker images

  • Dynamic provisioning of the instances (EC2, Spot Fleet, ECS) in VPC

  • Optimal quantity and type based on volume and requirements

  • No need to manage clusters, fully severless, pay for EC2 instances

  • Use case:

    • batch process of images

    • running thousands of concurrent jobs

  • Schedule Batch jobs using CloudWatch Events

  • Orchestrate Batch jobs using AWS Step Functions

Feature

  • Compute Environment

    • Managed compute environment

      • AWS Batch manages the capacity and instance types within the environment. (Doesn't need to configure ASG.)

      • Can choose On-demand / Spot Instances / Spot Fleet

      • Can set a maximum price for Spot Instances

      • Launched within your own VPC

        • If launch within your own private subnet, make sure it has access to the ECS services.

        • Either use a NAT Gateway or using VPC Endpoints for ECS.

    • Unmanaged compute environment

      • You control and manage instance configuration, provisioning and scaling.

  • Multi Node Mode (for large scale, high performance computing)

    • 1 main node, many child node.

    • Leverage multiple EC2 / ECS instances at the same time

    • Doesn't work with Spot Instances

    • Good for tightly coupled workloads

    • Represents a single job, and specific how many nodes to create for the job

    • Works better if your EC2 launch mode is a placement group "cluster"

  • Lambda vs Batch

    • Lambda

      • with time limit

      • limited runtimes

      • limited temporary disk space

      • Serverless

    • Batch

      • without time limit

      • Any runtime as long as it's packaged as a Docker Image

      • Rely on EBS / instance store for disk space

      • Relies on EC2 (can be managed by AWS)

Scenario

  • Batch architecture example

Last updated