Interface DataflowPipelineWorkerPoolOptions

All Superinterfaces:
FileStagingOptions, GcpOptions, GoogleApiDebugOptions, HasDisplayData, PipelineOptions
All Known Subinterfaces:
DataflowPipelineOptions, DataflowWorkerHarnessOptions, TestDataflowPipelineOptions

public interface DataflowPipelineWorkerPoolOptions extends GcpOptions, FileStagingOptions
Options that are used to configure the Dataflow pipeline worker pool.
  • Method Details

    • getNumWorkers

      int getNumWorkers()
      Number of workers to use when executing the Dataflow job. Note that selection of an autoscaling algorithm other then NONE will affect the size of the worker pool. If left unspecified, the Dataflow service will determine the number of workers.
    • setNumWorkers

      void setNumWorkers(int value)
    • getAutoscalingAlgorithm

      The autoscaling algorithm to use for the workerpool.
      • NONE: does not change the size of the worker pool.
      • BASIC: autoscale the worker pool size up to maxNumWorkers until the job completes.
      • THROUGHPUT_BASED: autoscale the workerpool based on throughput (up to maxNumWorkers).
    • setAutoscalingAlgorithm

    • getMaxNumWorkers

      int getMaxNumWorkers()
      The maximum number of workers to use for the workerpool. This options limits the size of the workerpool for the lifetime of the job, including pipeline updates. If left unspecified, the Dataflow service will compute a ceiling.
    • setMaxNumWorkers

      void setMaxNumWorkers(int value)
    • getDiskSizeGb

      int getDiskSizeGb()
      Remote worker disk size, in gigabytes, or 0 to use the default size.
    • setDiskSizeGb

      void setDiskSizeGb(int value)
    • getWorkerHarnessContainerImage

      @Deprecated @Hidden String getWorkerHarnessContainerImage()
      Deprecated.
    • setWorkerHarnessContainerImage

      @Deprecated @Hidden void setWorkerHarnessContainerImage(String value)
      Deprecated.
    • getSdkContainerImage

      String getSdkContainerImage()
      Container image used to configure SDK execution environment on worker. Used for custom containers on portable pipelines only.
    • setSdkContainerImage

      void setSdkContainerImage(String value)
    • getNetwork

      String getNetwork()
      GCE network for launching workers.

      Default is up to the Dataflow service.

    • setNetwork

      void setNetwork(String value)
    • getSubnetwork

      String getSubnetwork()
      GCE subnetwork for launching workers.

      Default is up to the Dataflow service. Expected format is regions/REGION/subnetworks/SUBNETWORK or the fully qualified subnetwork name, beginning with https://..., e.g. https://www.googleapis.com/compute/alpha/projects/PROJECT/ regions/REGION/subnetworks/SUBNETWORK

    • setSubnetwork

      void setSubnetwork(String value)
    • getWorkerMachineType

      String getWorkerMachineType()
      Machine type to create Dataflow worker VMs as.

      See GCE machine types for a list of valid options.

      If unset, the Dataflow service will choose a reasonable default.

    • setWorkerMachineType

      void setWorkerMachineType(String value)
    • getWorkerDiskType

      String getWorkerDiskType()
      Specifies what type of persistent disk is used. The value is a full disk type resource, e.g., compute.googleapis.com/projects//zones//diskTypes/pd-ssd. For more information, see the API reference documentation for DiskTypes.
    • setWorkerDiskType

      void setWorkerDiskType(String value)
    • getUsePublicIps

      @Nullable Boolean getUsePublicIps()
      Specifies whether worker pools should be started with public IP addresses.

      WARNING: This feature is available only through allowlist.

    • setUsePublicIps

      void setUsePublicIps(@Nullable Boolean value)
    • getMinCpuPlatform

      @Nullable String getMinCpuPlatform()
      Specifies a Minimum CPU platform for VM instances.

      More details see Specifying Pipeline Execution Parameters.

    • setMinCpuPlatform

      void setMinCpuPlatform(String minCpuPlatform)