Interface DataflowPipelineWorkerPoolOptions
- All Superinterfaces:
FileStagingOptions,GcpOptions,GoogleApiDebugOptions,HasDisplayData,PipelineOptions
- All Known Subinterfaces:
DataflowPipelineOptions,DataflowWorkerHarnessOptions,TestDataflowPipelineOptions
Options that are used to configure the Dataflow pipeline worker pool.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enumType of autoscaling algorithm to use.Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
GcpOptions.DefaultProjectFactory, GcpOptions.EnableStreamingEngineFactory, GcpOptions.GcpOAuthScopesFactory, GcpOptions.GcpTempLocationFactory, GcpOptions.GcpUserCredentialsFactoryNested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
GoogleApiDebugOptions.GoogleApiTracerNested classes/interfaces inherited from interface org.apache.beam.sdk.options.PipelineOptions
PipelineOptions.AtomicLongFactory, PipelineOptions.CheckEnabled, PipelineOptions.DirectRunner, PipelineOptions.JobNameFactory, PipelineOptions.UserAgentFactory -
Field Summary
Fields inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
STREAMING_ENGINE_EXPERIMENT, WINDMILL_SERVICE_EXPERIMENT -
Method Summary
Modifier and TypeMethodDescriptionThe autoscaling algorithm to use for the workerpool.intRemote worker disk size, in gigabytes, or 0 to use the default size.intThe maximum number of workers to use for the workerpool.Specifies a Minimum CPU platform for VM instances.GCE network for launching workers.intNumber of workers to use when executing the Dataflow job.Container image used to configure SDK execution environment on worker.GCE subnetwork for launching workers.Specifies whether worker pools should be started with public IP addresses.Specifies what type of persistent disk is used.Deprecated.Machine type to create Dataflow worker VMs as.voidvoidsetDiskSizeGb(int value) voidsetMaxNumWorkers(int value) voidsetMinCpuPlatform(String minCpuPlatform) voidsetNetwork(String value) voidsetNumWorkers(int value) voidsetSdkContainerImage(String value) voidsetSubnetwork(String value) voidsetUsePublicIps(@Nullable Boolean value) voidsetWorkerDiskType(String value) voidDeprecated.UsesetSdkContainerImage(java.lang.String)instead.voidsetWorkerMachineType(String value) Methods inherited from interface org.apache.beam.sdk.options.FileStagingOptions
getFilesToStage, setFilesToStageMethods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
getCredentialFactoryClass, getDataflowKmsKey, getGcpCredential, getGcpOauthScopes, getGcpTempLocation, getImpersonateServiceAccount, getProject, getWorkerRegion, getWorkerZone, getZone, isEnableStreamingEngine, setCredentialFactoryClass, setDataflowKmsKey, setEnableStreamingEngine, setGcpCredential, setGcpOauthScopes, setGcpTempLocation, setImpersonateServiceAccount, setProject, setWorkerRegion, setWorkerZone, setZoneMethods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
getGoogleApiTrace, setGoogleApiTraceMethods inherited from interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayDataMethods inherited from interface org.apache.beam.sdk.options.PipelineOptions
as, getGbek, getJobName, getOptionsId, getRunner, getStableUniqueNames, getTempLocation, getUserAgent, outputRuntimeOptions, revision, setGbek, setJobName, setOptionsId, setRunner, setStableUniqueNames, setTempLocation, setUserAgent
-
Method Details
-
getNumWorkers
int getNumWorkers()Number of workers to use when executing the Dataflow job. Note that selection of an autoscaling algorithm other thenNONEwill affect the size of the worker pool. If left unspecified, the Dataflow service will determine the number of workers. -
setNumWorkers
void setNumWorkers(int value) -
getAutoscalingAlgorithm
DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType getAutoscalingAlgorithm()The autoscaling algorithm to use for the workerpool.- NONE: does not change the size of the worker pool.
- BASIC: autoscale the worker pool size up to maxNumWorkers until the job completes.
- THROUGHPUT_BASED: autoscale the workerpool based on throughput (up to maxNumWorkers).
-
setAutoscalingAlgorithm
-
getMaxNumWorkers
int getMaxNumWorkers()The maximum number of workers to use for the workerpool. This options limits the size of the workerpool for the lifetime of the job, including pipeline updates. If left unspecified, the Dataflow service will compute a ceiling. -
setMaxNumWorkers
void setMaxNumWorkers(int value) -
getDiskSizeGb
int getDiskSizeGb()Remote worker disk size, in gigabytes, or 0 to use the default size. -
setDiskSizeGb
void setDiskSizeGb(int value) -
getWorkerHarnessContainerImage
Deprecated.UsegetSdkContainerImage()instead. -
setWorkerHarnessContainerImage
Deprecated.UsesetSdkContainerImage(java.lang.String)instead. -
getSdkContainerImage
String getSdkContainerImage()Container image used to configure SDK execution environment on worker. Used for custom containers on portable pipelines only. -
setSdkContainerImage
-
getNetwork
String getNetwork()GCE network for launching workers.Default is up to the Dataflow service.
-
setNetwork
-
getSubnetwork
String getSubnetwork()GCE subnetwork for launching workers.Default is up to the Dataflow service. Expected format is regions/REGION/subnetworks/SUBNETWORK or the fully qualified subnetwork name, beginning with https://..., e.g. https://www.googleapis.com/compute/alpha/projects/PROJECT/ regions/REGION/subnetworks/SUBNETWORK
-
setSubnetwork
-
getWorkerMachineType
String getWorkerMachineType()Machine type to create Dataflow worker VMs as.See GCE machine types for a list of valid options.
If unset, the Dataflow service will choose a reasonable default.
-
setWorkerMachineType
-
getWorkerDiskType
String getWorkerDiskType()Specifies what type of persistent disk is used. The value is a full disk type resource, e.g., compute.googleapis.com/projects//zones//diskTypes/pd-ssd. For more information, see the API reference documentation for DiskTypes. -
setWorkerDiskType
-
getUsePublicIps
Specifies whether worker pools should be started with public IP addresses.WARNING: This feature is available only through allowlist.
-
setUsePublicIps
-
getMinCpuPlatform
Specifies a Minimum CPU platform for VM instances.More details see Specifying Pipeline Execution Parameters.
-
setMinCpuPlatform
-
getSdkContainerImage()instead.