Interface DataflowPipelineOptions
- All Superinterfaces:
ApplicationNameOptions
,BigQueryOptions
,DataflowPipelineDebugOptions
,DataflowPipelineWorkerPoolOptions
,DataflowProfilingOptions
,DataflowStreamingPipelineOptions
,DataflowWorkerLoggingOptions
,ExperimentalOptions
,FileStagingOptions
,GcpOptions
,GcsOptions
,GoogleApiDebugOptions
,HasDisplayData
,MemoryMonitorOptions
,PipelineOptions
,PubsubOptions
,StreamingOptions
- All Known Subinterfaces:
DataflowWorkerHarnessOptions
,TestDataflowPipelineOptions
DataflowRunner
.-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enum
Set of available Flexible Resource Scheduling goals.static class
Returns a default staging location underGcpOptions.getGcpTempLocation()
.Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions
DataflowPipelineDebugOptions.DataflowClientFactory, DataflowPipelineDebugOptions.StagerFactory, DataflowPipelineDebugOptions.UnboundedReaderMaxReadTimeFactory
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions
DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowProfilingOptions
DataflowProfilingOptions.DataflowProfilingAgentConfiguration
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowStreamingPipelineOptions
DataflowStreamingPipelineOptions.EnableWindmillServiceDirectPathFactory, DataflowStreamingPipelineOptions.GlobalConfigRefreshPeriodFactory, DataflowStreamingPipelineOptions.HarnessUpdateReportingPeriodFactory, DataflowStreamingPipelineOptions.LocalWindmillHostportFactory, DataflowStreamingPipelineOptions.MaxStackTraceDepthToReportFactory, DataflowStreamingPipelineOptions.PeriodicStatusPageDirectoryFactory, DataflowStreamingPipelineOptions.WindmillServiceStreamingRpcBatchLimitFactory
Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowWorkerLoggingOptions
DataflowWorkerLoggingOptions.Level, DataflowWorkerLoggingOptions.WorkerLogLevelOverrides
Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
GcpOptions.DefaultProjectFactory, GcpOptions.EnableStreamingEngineFactory, GcpOptions.GcpOAuthScopesFactory, GcpOptions.GcpTempLocationFactory, GcpOptions.GcpUserCredentialsFactory
Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcsOptions
GcsOptions.ExecutorServiceFactory, GcsOptions.GcsCustomAuditEntries, GcsOptions.PathValidatorFactory
Nested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
GoogleApiDebugOptions.GoogleApiTracer
Nested classes/interfaces inherited from interface org.apache.beam.sdk.options.PipelineOptions
PipelineOptions.AtomicLongFactory, PipelineOptions.CheckEnabled, PipelineOptions.DirectRunner, PipelineOptions.JobNameFactory, PipelineOptions.UserAgentFactory
-
Field Summary
Fields inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
STATE_CACHE_SIZE, STATE_SAMPLING_PERIOD_MILLIS
Fields inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
STREAMING_ENGINE_EXPERIMENT, WINDMILL_SERVICE_EXPERIMENT
-
Method Summary
Modifier and TypeMethodDescriptionIf set, the snapshot from which the job should be created.Dataflow endpoint to use.Service options are set by the user and configure the service.This option controls Flexible Resource Scheduling mode.Open modules needed for reflection that access JDK internals with Java 9+Labels that will be applied to the billing records for this job.The URL of the staged portable pipeline.Project id to use when launching jobs.The Google Compute Engine region for creating Dataflow jobs.Run the job as a specific service account, instead of the default GCE robot.GCS path for staging local files, e.g.Where the runner should generate a template file.boolean
If enabled then the literal key will be logged to Cloud Logging if a hot key is detected.boolean
isUpdate()
Whether to update the currently running pipeline with the same name as this one.void
setCreateFromSnapshot
(String value) void
setDataflowEndpoint
(String value) void
setDataflowServiceOptions
(List<String> options) void
setDataflowWorkerJar
(String dataflowWorkerJar) void
void
setHotKeyLoggingEnabled
(boolean value) void
setJdkAddOpenModules
(List<String> options) void
void
setPipelineUrl
(String urlString) void
setProject
(String value) void
void
setServiceAccount
(String value) void
setStagingLocation
(String value) void
setTemplateLocation
(String value) Sets the Cloud Storage path where the Dataflow template will be stored.void
setUpdate
(boolean value) Methods inherited from interface org.apache.beam.sdk.options.ApplicationNameOptions
getAppName, setAppName
Methods inherited from interface org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions
getBigQueryEndpoint, getBigQueryProject, getBqStreamingApiLoggingFrequencySec, getEnableStorageReadApiV2, getGroupFilesFileLoad, getHTTPReadTimeout, getHTTPWriteTimeout, getInsertBundleParallelism, getJobLabelsMap, getMaxBufferingDurationMilliSec, getMaxConnectionPoolConnections, getMaxStreamingBatchSize, getMaxStreamingRowsToBatch, getMinConnectionPoolConnections, getNumStorageWriteApiStreamAppendClients, getNumStorageWriteApiStreams, getNumStreamingKeys, getStorageApiAppendThresholdBytes, getStorageApiAppendThresholdRecordCount, getStorageWriteApiMaxRequestSize, getStorageWriteApiMaxRetries, getStorageWriteApiTriggeringFrequencySec, getStorageWriteMaxInflightBytes, getStorageWriteMaxInflightRequests, getTempDatasetId, getUseStorageApiConnectionPool, getUseStorageWriteApi, getUseStorageWriteApiAtLeastOnce, setBigQueryEndpoint, setBigQueryProject, setBqStreamingApiLoggingFrequencySec, setEnableStorageReadApiV2, setGroupFilesFileLoad, setHTTPReadTimeout, setHTTPWriteTimeout, setInsertBundleParallelism, setJobLabelsMap, setMaxBufferingDurationMilliSec, setMaxConnectionPoolConnections, setMaxStreamingBatchSize, setMaxStreamingRowsToBatch, setMinConnectionPoolConnections, setNumStorageWriteApiStreamAppendClients, setNumStorageWriteApiStreams, setNumStreamingKeys, setStorageApiAppendThresholdBytes, setStorageApiAppendThresholdRecordCount, setStorageWriteApiMaxRequestSize, setStorageWriteApiMaxRetries, setStorageWriteApiTriggeringFrequencySec, setStorageWriteMaxInflightBytes, setStorageWriteMaxInflightRequests, setTempDatasetId, setUseStorageApiConnectionPool, setUseStorageWriteApi, setUseStorageWriteApiAtLeastOnce
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions
getApiRootUrl, getDataflowClient, getDataflowJobFile, getDesiredNumUnboundedSourceSplits, getDumpHeapOnOOM, getJfrRecordingDurationSec, getNumberOfWorkerHarnessThreads, getReaderCacheTimeoutSec, getRecordJfrOnGcThrashing, getSaveHeapDumpsToGcsPath, getSdkHarnessContainerImageOverrides, getStager, getStagerClass, getTransformNameMapping, getUnboundedReaderMaxElements, getUnboundedReaderMaxReadTimeMs, getUnboundedReaderMaxReadTimeSec, getUnboundedReaderMaxWaitForElementsMs, getWorkerCacheMb, setApiRootUrl, setDataflowClient, setDataflowJobFile, setDesiredNumUnboundedSourceSplits, setDumpHeapOnOOM, setJfrRecordingDurationSec, setNumberOfWorkerHarnessThreads, setReaderCacheTimeoutSec, setRecordJfrOnGcThrashing, setSaveHeapDumpsToGcsPath, setSdkHarnessContainerImageOverrides, setStager, setStagerClass, setTransformNameMapping, setUnboundedReaderMaxElements, setUnboundedReaderMaxReadTimeMs, setUnboundedReaderMaxReadTimeSec, setUnboundedReaderMaxWaitForElementsMs, setWorkerCacheMb
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions
getAutoscalingAlgorithm, getDiskSizeGb, getMaxNumWorkers, getMinCpuPlatform, getNetwork, getNumWorkers, getSdkContainerImage, getSubnetwork, getUsePublicIps, getWorkerDiskType, getWorkerHarnessContainerImage, getWorkerMachineType, setAutoscalingAlgorithm, setDiskSizeGb, setMaxNumWorkers, setMinCpuPlatform, setNetwork, setNumWorkers, setSdkContainerImage, setSubnetwork, setUsePublicIps, setWorkerDiskType, setWorkerHarnessContainerImage, setWorkerMachineType
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowProfilingOptions
getProfilingAgentConfiguration, getSaveProfilesToGcs, setProfilingAgentConfiguration, setSaveProfilesToGcs
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowStreamingPipelineOptions
getActiveWorkRefreshPeriodMillis, getChannelzShowOnlyWindmillServiceChannels, getGlobalConfigRefreshPeriod, getIsWindmillServiceDirectPathEnabled, getLocalWindmillHostport, getMaxBundlesFromWindmillOutstanding, getMaxBytesFromWindmillOutstanding, getMaxStackTraceDepthToReport, getOverrideWindmillBinary, getPeriodicStatusPageOutputDirectory, getPerWorkerMetricsUpdateReportingPeriodMillis, getStreamingSideInputCacheExpirationMillis, getStreamingSideInputCacheMb, getStuckCommitDurationMillis, getUseSeparateWindmillHeartbeatStreams, getUseWindmillIsolatedChannels, getWindmillGetDataStreamCount, getWindmillHarnessUpdateReportingPeriod, getWindmillMessagesBetweenIsReadyChecks, getWindmillRequestBatchedGetWorkResponse, getWindmillServiceCommitThreads, getWindmillServiceEndpoint, getWindmillServicePort, getWindmillServiceRpcChannelAliveTimeoutSec, getWindmillServiceStreamingLogEveryNStreamFailures, getWindmillServiceStreamingRpcBatchLimit, getWindmillServiceStreamingRpcHealthCheckPeriodMs, getWindmillServiceStreamMaxBackoffMillis, setActiveWorkRefreshPeriodMillis, setChannelzShowOnlyWindmillServiceChannels, setGlobalConfigRefreshPeriod, setIsWindmillServiceDirectPathEnabled, setLocalWindmillHostport, setMaxBundlesFromWindmillOutstanding, setMaxBytesFromWindmillOutstanding, setMaxStackTraceDepthToReport, setOverrideWindmillBinary, setPeriodicStatusPageOutputDirectory, setPerWorkerMetricsUpdateReportingPeriodMillis, setStreamingSideInputCacheExpirationMillis, setStreamingSideInputCacheMb, setStuckCommitDurationMillis, setUseSeparateWindmillHeartbeatStreams, setUseWindmillIsolatedChannels, setWindmillGetDataStreamCount, setWindmillHarnessUpdateReportingPeriod, setWindmillMessagesBetweenIsReadyChecks, setWindmillRequestBatchedGetWorkResponse, setWindmillServiceCommitThreads, setWindmillServiceEndpoint, setWindmillServicePort, setWindmillServiceRpcChannelAliveTimeoutSec, setWindmillServiceStreamingLogEveryNStreamFailures, setWindmillServiceStreamingRpcBatchLimit, setWindmillServiceStreamingRpcHealthCheckPeriodMs, setWindmillServiceStreamMaxBackoffMillis
Methods inherited from interface org.apache.beam.runners.dataflow.options.DataflowWorkerLoggingOptions
getDefaultWorkerLogLevel, getWorkerLogLevelOverrides, getWorkerSystemErrMessageLevel, getWorkerSystemOutMessageLevel, setDefaultWorkerLogLevel, setWorkerLogLevelOverrides, setWorkerSystemErrMessageLevel, setWorkerSystemOutMessageLevel
Methods inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
getExperiments, setExperiments
Methods inherited from interface org.apache.beam.sdk.options.FileStagingOptions
getFilesToStage, setFilesToStage
Methods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
getCredentialFactoryClass, getDataflowKmsKey, getGcpCredential, getGcpOauthScopes, getGcpTempLocation, getImpersonateServiceAccount, getWorkerRegion, getWorkerZone, getZone, isEnableStreamingEngine, setCredentialFactoryClass, setDataflowKmsKey, setEnableStreamingEngine, setGcpCredential, setGcpOauthScopes, setGcpTempLocation, setImpersonateServiceAccount, setWorkerRegion, setWorkerZone, setZone
Methods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcsOptions
getEnableBucketReadMetricCounter, getEnableBucketWriteMetricCounter, getExecutorService, getGcsCustomAuditEntries, getGcsEndpoint, getGcsHttpRequestReadTimeout, getGcsHttpRequestWriteTimeout, getGcsPerformanceMetrics, getGcsReadCounterPrefix, getGcsRewriteDataOpBatchLimit, getGcsUploadBufferSizeBytes, getGcsUtil, getGcsWriteCounterPrefix, getGoogleCloudStorageReadOptions, getPathValidator, getPathValidatorClass, setEnableBucketReadMetricCounter, setEnableBucketWriteMetricCounter, setExecutorService, setGcsCustomAuditEntries, setGcsEndpoint, setGcsHttpRequestReadTimeout, setGcsHttpRequestWriteTimeout, setGcsPerformanceMetrics, setGcsReadCounterPrefix, setGcsRewriteDataOpBatchLimit, setGcsUploadBufferSizeBytes, setGcsUtil, setGcsWriteCounterPrefix, setGoogleCloudStorageReadOptions, setPathValidator, setPathValidatorClass
Methods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
getGoogleApiTrace, setGoogleApiTrace
Methods inherited from interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayData
Methods inherited from interface org.apache.beam.sdk.options.MemoryMonitorOptions
getGCThrashingPercentagePerPeriod, getGzipCompressHeapDumps, getRemoteHeapDumpLocation, setGCThrashingPercentagePerPeriod, setGzipCompressHeapDumps, setRemoteHeapDumpLocation
Methods inherited from interface org.apache.beam.sdk.options.PipelineOptions
as, getJobName, getOptionsId, getRunner, getStableUniqueNames, getTempLocation, getUserAgent, outputRuntimeOptions, revision, setJobName, setOptionsId, setRunner, setStableUniqueNames, setTempLocation, setUserAgent
Methods inherited from interface org.apache.beam.sdk.io.gcp.pubsub.PubsubOptions
getPubsubRootUrl, setPubsubRootUrl
Methods inherited from interface org.apache.beam.sdk.options.StreamingOptions
getUpdateCompatibilityVersion, isStreaming, setStreaming, setUpdateCompatibilityVersion
-
Method Details
-
getProject
Description copied from interface:GcpOptions
Project id to use when launching jobs.- Specified by:
getProject
in interfaceGcpOptions
-
setProject
- Specified by:
setProject
in interfaceGcpOptions
-
getStagingLocation
GCS path for staging local files, e.g. gs://bucket/objectMust be a valid Cloud Storage URL, beginning with the prefix "gs://"
If
getStagingLocation()
is not set, it will default toGcpOptions.getGcpTempLocation()
.GcpOptions.getGcpTempLocation()
must be a valid GCS path. -
setStagingLocation
-
isUpdate
boolean isUpdate()Whether to update the currently running pipeline with the same name as this one. -
setUpdate
void setUpdate(boolean value) -
getCreateFromSnapshot
String getCreateFromSnapshot()If set, the snapshot from which the job should be created. -
setCreateFromSnapshot
-
getTemplateLocation
String getTemplateLocation()Where the runner should generate a template file. Must either be local or Cloud Storage. -
setTemplateLocation
Sets the Cloud Storage path where the Dataflow template will be stored. Required for creating Flex Templates or Classic Templates.Example:
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class); options.setTemplateLocation("gs://your-bucket/templates/my-template");
- Parameters:
value
- Cloud Storage path for storing the Dataflow template.
-
getDataflowServiceOptions
Service options are set by the user and configure the service. This decouples service side feature availability from the Apache Beam release cycle. -
setDataflowServiceOptions
-
getServiceAccount
String getServiceAccount()Run the job as a specific service account, instead of the default GCE robot. -
setServiceAccount
-
getRegion
The Google Compute Engine region for creating Dataflow jobs. -
setRegion
-
getDataflowEndpoint
Dataflow endpoint to use.Defaults to the current version of the Google Cloud Dataflow API, at the time the current SDK version was released.
If the string contains "://", then this is treated as a URL, otherwise
DataflowPipelineDebugOptions.getApiRootUrl()
is used as the root URL.- Specified by:
getDataflowEndpoint
in interfaceDataflowPipelineDebugOptions
-
setDataflowEndpoint
- Specified by:
setDataflowEndpoint
in interfaceDataflowPipelineDebugOptions
-
getLabels
Labels that will be applied to the billing records for this job. -
setLabels
-
getPipelineUrl
String getPipelineUrl()The URL of the staged portable pipeline. -
setPipelineUrl
-
getDataflowWorkerJar
String getDataflowWorkerJar() -
setDataflowWorkerJar
-
getFlexRSGoal
This option controls Flexible Resource Scheduling mode. -
setFlexRSGoal
-
isHotKeyLoggingEnabled
boolean isHotKeyLoggingEnabled()If enabled then the literal key will be logged to Cloud Logging if a hot key is detected. -
setHotKeyLoggingEnabled
void setHotKeyLoggingEnabled(boolean value) -
getJdkAddOpenModules
Open modules needed for reflection that access JDK internals with Java 9+With JDK 16+, JDK internals are strongly encapsulated and can result in an InaccessibleObjectException being thrown if a tool or library uses reflection that access JDK internals. If you see these errors in your worker logs, you can pass in modules to open using the format module/package=target-module(,target-module)* to allow access to the library. E.g. java.base/java.lang=jamm
You may see warnings that jamm, a library used to more accurately size objects, is unable to make a private field accessible. To resolve the warning, open the specified module/package to jamm.
-
setJdkAddOpenModules
-