Interface DataflowPipelineWorkerPoolOptions
- All Superinterfaces:
FileStagingOptions
,GcpOptions
,GoogleApiDebugOptions
,HasDisplayData
,PipelineOptions
- All Known Subinterfaces:
DataflowPipelineOptions
,DataflowWorkerHarnessOptions
,TestDataflowPipelineOptions
Options that are used to configure the Dataflow pipeline worker pool.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enum
Type of autoscaling algorithm to use.Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
GcpOptions.DefaultProjectFactory, GcpOptions.EnableStreamingEngineFactory, GcpOptions.GcpOAuthScopesFactory, GcpOptions.GcpTempLocationFactory, GcpOptions.GcpUserCredentialsFactory
Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
GoogleApiDebugOptions.GoogleApiTracer
Nested classes/interfaces inherited from interface org.apache.beam.sdk.options.PipelineOptions
PipelineOptions.AtomicLongFactory, PipelineOptions.CheckEnabled, PipelineOptions.DirectRunner, PipelineOptions.JobNameFactory, PipelineOptions.UserAgentFactory
-
Field Summary
Fields inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
STREAMING_ENGINE_EXPERIMENT, WINDMILL_SERVICE_EXPERIMENT
-
Method Summary
Modifier and TypeMethodDescriptionThe autoscaling algorithm to use for the workerpool.int
Remote worker disk size, in gigabytes, or 0 to use the default size.int
The maximum number of workers to use for the workerpool.Specifies a Minimum CPU platform for VM instances.GCE network for launching workers.int
Number of workers to use when executing the Dataflow job.Container image used to configure SDK execution environment on worker.GCE subnetwork for launching workers.Specifies whether worker pools should be started with public IP addresses.Specifies what type of persistent disk is used.Deprecated.Machine type to create Dataflow worker VMs as.void
void
setDiskSizeGb
(int value) void
setMaxNumWorkers
(int value) void
setMinCpuPlatform
(String minCpuPlatform) void
setNetwork
(String value) void
setNumWorkers
(int value) void
setSdkContainerImage
(String value) void
setSubnetwork
(String value) void
setUsePublicIps
(@Nullable Boolean value) void
setWorkerDiskType
(String value) void
Deprecated.UsesetSdkContainerImage(java.lang.String)
instead.void
setWorkerMachineType
(String value) Methods inherited from interface org.apache.beam.sdk.options.FileStagingOptions
getFilesToStage, setFilesToStage
Methods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
getCredentialFactoryClass, getDataflowKmsKey, getGcpCredential, getGcpOauthScopes, getGcpTempLocation, getImpersonateServiceAccount, getProject, getWorkerRegion, getWorkerZone, getZone, isEnableStreamingEngine, setCredentialFactoryClass, setDataflowKmsKey, setEnableStreamingEngine, setGcpCredential, setGcpOauthScopes, setGcpTempLocation, setImpersonateServiceAccount, setProject, setWorkerRegion, setWorkerZone, setZone
Methods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
getGoogleApiTrace, setGoogleApiTrace
Methods inherited from interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayData
Methods inherited from interface org.apache.beam.sdk.options.PipelineOptions
as, getJobName, getOptionsId, getRunner, getStableUniqueNames, getTempLocation, getUserAgent, outputRuntimeOptions, revision, setJobName, setOptionsId, setRunner, setStableUniqueNames, setTempLocation, setUserAgent
-
Method Details
-
getNumWorkers
int getNumWorkers()Number of workers to use when executing the Dataflow job. Note that selection of an autoscaling algorithm other thenNONE
will affect the size of the worker pool. If left unspecified, the Dataflow service will determine the number of workers. -
setNumWorkers
void setNumWorkers(int value) -
getAutoscalingAlgorithm
DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType getAutoscalingAlgorithm()The autoscaling algorithm to use for the workerpool.- NONE: does not change the size of the worker pool.
- BASIC: autoscale the worker pool size up to maxNumWorkers until the job completes.
- THROUGHPUT_BASED: autoscale the workerpool based on throughput (up to maxNumWorkers).
-
setAutoscalingAlgorithm
-
getMaxNumWorkers
int getMaxNumWorkers()The maximum number of workers to use for the workerpool. This options limits the size of the workerpool for the lifetime of the job, including pipeline updates. If left unspecified, the Dataflow service will compute a ceiling. -
setMaxNumWorkers
void setMaxNumWorkers(int value) -
getDiskSizeGb
int getDiskSizeGb()Remote worker disk size, in gigabytes, or 0 to use the default size. -
setDiskSizeGb
void setDiskSizeGb(int value) -
getWorkerHarnessContainerImage
Deprecated.UsegetSdkContainerImage()
instead. -
setWorkerHarnessContainerImage
Deprecated.UsesetSdkContainerImage(java.lang.String)
instead. -
getSdkContainerImage
String getSdkContainerImage()Container image used to configure SDK execution environment on worker. Used for custom containers on portable pipelines only. -
setSdkContainerImage
-
getNetwork
String getNetwork()GCE network for launching workers.Default is up to the Dataflow service.
-
setNetwork
-
getSubnetwork
String getSubnetwork()GCE subnetwork for launching workers.Default is up to the Dataflow service. Expected format is regions/REGION/subnetworks/SUBNETWORK or the fully qualified subnetwork name, beginning with https://..., e.g. https://www.googleapis.com/compute/alpha/projects/PROJECT/ regions/REGION/subnetworks/SUBNETWORK
-
setSubnetwork
-
getWorkerMachineType
String getWorkerMachineType()Machine type to create Dataflow worker VMs as.See GCE machine types for a list of valid options.
If unset, the Dataflow service will choose a reasonable default.
-
setWorkerMachineType
-
getWorkerDiskType
String getWorkerDiskType()Specifies what type of persistent disk is used. The value is a full disk type resource, e.g., compute.googleapis.com/projects//zones//diskTypes/pd-ssd. For more information, see the API reference documentation for DiskTypes. -
setWorkerDiskType
-
getUsePublicIps
Specifies whether worker pools should be started with public IP addresses.WARNING: This feature is available only through allowlist.
-
setUsePublicIps
-
getMinCpuPlatform
Specifies a Minimum CPU platform for VM instances.More details see Specifying Pipeline Execution Parameters.
-
setMinCpuPlatform
-
getSdkContainerImage()
instead.