public interface DataflowPipelineOptions extends PipelineOptions, GcpOptions, ApplicationNameOptions, DataflowPipelineDebugOptions, DataflowPipelineWorkerPoolOptions, BigQueryOptions, GcsOptions, StreamingOptions, DataflowWorkerLoggingOptions, DataflowStreamingPipelineOptions, DataflowProfilingOptions, PubsubOptions
DataflowRunner.| Modifier and Type | Interface and Description | 
|---|---|
static class  | 
DataflowPipelineOptions.FlexResourceSchedulingGoal
Set of available Flexible Resource Scheduling goals. 
 | 
static class  | 
DataflowPipelineOptions.StagingLocationFactory
Returns a default staging location under  
GcpOptions.getGcpTempLocation(). | 
DataflowPipelineDebugOptions.DataflowClientFactory, DataflowPipelineDebugOptions.StagerFactory, DataflowPipelineDebugOptions.UnboundedReaderMaxReadTimeFactoryDataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmTypeGcsOptions.ExecutorServiceFactory, GcsOptions.GcsCustomAuditEntries, GcsOptions.PathValidatorFactoryDataflowWorkerLoggingOptions.Level, DataflowWorkerLoggingOptions.WorkerLogLevelOverridesDataflowStreamingPipelineOptions.EnableWindmillServiceDirectPathFactory, DataflowStreamingPipelineOptions.GlobalConfigRefreshPeriodFactory, DataflowStreamingPipelineOptions.HarnessUpdateReportingPeriodFactory, DataflowStreamingPipelineOptions.LocalWindmillHostportFactory, DataflowStreamingPipelineOptions.MaxStackTraceDepthToReportFactory, DataflowStreamingPipelineOptions.PeriodicStatusPageDirectoryFactory, DataflowStreamingPipelineOptions.WindmillServiceStreamingRpcBatchLimitFactoryDataflowProfilingOptions.DataflowProfilingAgentConfigurationGcpOptions.DefaultProjectFactory, GcpOptions.EnableStreamingEngineFactory, GcpOptions.GcpOAuthScopesFactory, GcpOptions.GcpTempLocationFactory, GcpOptions.GcpUserCredentialsFactoryGoogleApiDebugOptions.GoogleApiTracerSTATE_CACHE_SIZE, STATE_SAMPLING_PERIOD_MILLISSTREAMING_ENGINE_EXPERIMENT, WINDMILL_SERVICE_EXPERIMENT| Modifier and Type | Method and Description | 
|---|---|
java.lang.String | 
getCreateFromSnapshot()
If set, the snapshot from which the job should be created. 
 | 
java.lang.String | 
getDataflowEndpoint()
Dataflow endpoint to use. 
 | 
java.util.List<java.lang.String> | 
getDataflowServiceOptions()
Service options are set by the user and configure the service. 
 | 
java.lang.String | 
getDataflowWorkerJar()  | 
DataflowPipelineOptions.FlexResourceSchedulingGoal | 
getFlexRSGoal()
This option controls Flexible Resource Scheduling mode. 
 | 
java.util.List<java.lang.String> | 
getJdkAddOpenModules()
Open modules needed for reflection that access JDK internals with Java 9+ 
 | 
java.util.Map<java.lang.String,java.lang.String> | 
getLabels()
Labels that will be applied to the billing records for this job. 
 | 
java.lang.String | 
getPipelineUrl()
The URL of the staged portable pipeline. 
 | 
java.lang.String | 
getProject()
Project id to use when launching jobs. 
 | 
java.lang.String | 
getRegion()
The Google Compute Engine region for
 creating Dataflow jobs. 
 | 
java.lang.String | 
getServiceAccount()
Run the job as a specific service account, instead of the default GCE robot. 
 | 
java.lang.String | 
getStagingLocation()
GCS path for staging local files, e.g. 
 | 
java.lang.String | 
getTemplateLocation()
Where the runner should generate a template file. 
 | 
boolean | 
isHotKeyLoggingEnabled()
If enabled then the literal key will be logged to Cloud Logging if a hot key is detected. 
 | 
boolean | 
isUpdate()
Whether to update the currently running pipeline with the same name as this one. 
 | 
void | 
setCreateFromSnapshot(java.lang.String value)  | 
void | 
setDataflowEndpoint(java.lang.String value)  | 
void | 
setDataflowServiceOptions(java.util.List<java.lang.String> options)  | 
void | 
setDataflowWorkerJar(java.lang.String dataflowWorkerJar)  | 
void | 
setFlexRSGoal(DataflowPipelineOptions.FlexResourceSchedulingGoal goal)  | 
void | 
setHotKeyLoggingEnabled(boolean value)  | 
void | 
setJdkAddOpenModules(java.util.List<java.lang.String> options)  | 
void | 
setLabels(java.util.Map<java.lang.String,java.lang.String> labels)  | 
void | 
setPipelineUrl(java.lang.String urlString)  | 
void | 
setProject(java.lang.String value)  | 
void | 
setRegion(java.lang.String region)  | 
void | 
setServiceAccount(java.lang.String value)  | 
void | 
setStagingLocation(java.lang.String value)  | 
void | 
setTemplateLocation(java.lang.String value)
Sets the Cloud Storage path where the Dataflow template will be stored. 
 | 
void | 
setUpdate(boolean value)  | 
getApiRootUrl, getDataflowClient, getDataflowJobFile, getDesiredNumUnboundedSourceSplits, getDumpHeapOnOOM, getJfrRecordingDurationSec, getNumberOfWorkerHarnessThreads, getReaderCacheTimeoutSec, getRecordJfrOnGcThrashing, getSaveHeapDumpsToGcsPath, getSdkHarnessContainerImageOverrides, getStager, getStagerClass, getTransformNameMapping, getUnboundedReaderMaxElements, getUnboundedReaderMaxReadTimeMs, getUnboundedReaderMaxReadTimeSec, getUnboundedReaderMaxWaitForElementsMs, getWorkerCacheMb, setApiRootUrl, setDataflowClient, setDataflowJobFile, setDesiredNumUnboundedSourceSplits, setDumpHeapOnOOM, setJfrRecordingDurationSec, setNumberOfWorkerHarnessThreads, setReaderCacheTimeoutSec, setRecordJfrOnGcThrashing, setSaveHeapDumpsToGcsPath, setSdkHarnessContainerImageOverrides, setStager, setStagerClass, setTransformNameMapping, setUnboundedReaderMaxElements, setUnboundedReaderMaxReadTimeMs, setUnboundedReaderMaxReadTimeSec, setUnboundedReaderMaxWaitForElementsMs, setWorkerCacheMbaddExperiment, getExperiments, getExperimentValue, hasExperiment, setExperimentsgetGCThrashingPercentagePerPeriod, setGCThrashingPercentagePerPeriodgetAutoscalingAlgorithm, getDiskSizeGb, getMaxNumWorkers, getMinCpuPlatform, getNetwork, getNumWorkers, getSdkContainerImage, getSubnetwork, getUsePublicIps, getWorkerDiskType, getWorkerHarnessContainerImage, getWorkerMachineType, setAutoscalingAlgorithm, setDiskSizeGb, setMaxNumWorkers, setMinCpuPlatform, setNetwork, setNumWorkers, setSdkContainerImage, setSubnetwork, setUsePublicIps, setWorkerDiskType, setWorkerHarnessContainerImage, setWorkerMachineTypegetFilesToStage, setFilesToStagegetBigQueryEndpoint, getBigQueryProject, getBqStreamingApiLoggingFrequencySec, getEnableStorageReadApiV2, getGroupFilesFileLoad, getHTTPReadTimeout, getHTTPWriteTimeout, getInsertBundleParallelism, getJobLabelsMap, getMaxBufferingDurationMilliSec, getMaxConnectionPoolConnections, getMaxStreamingBatchSize, getMaxStreamingRowsToBatch, getMinConnectionPoolConnections, getNumStorageWriteApiStreamAppendClients, getNumStorageWriteApiStreams, getNumStreamingKeys, getStorageApiAppendThresholdBytes, getStorageApiAppendThresholdRecordCount, getStorageWriteApiMaxRequestSize, getStorageWriteApiMaxRetries, getStorageWriteApiTriggeringFrequencySec, getStorageWriteMaxInflightBytes, getStorageWriteMaxInflightRequests, getTempDatasetId, getUseStorageApiConnectionPool, getUseStorageWriteApi, getUseStorageWriteApiAtLeastOnce, setBigQueryEndpoint, setBigQueryProject, setBqStreamingApiLoggingFrequencySec, setEnableStorageReadApiV2, setGroupFilesFileLoad, setHTTPReadTimeout, setHTTPWriteTimeout, setInsertBundleParallelism, setJobLabelsMap, setMaxBufferingDurationMilliSec, setMaxConnectionPoolConnections, setMaxStreamingBatchSize, setMaxStreamingRowsToBatch, setMinConnectionPoolConnections, setNumStorageWriteApiStreamAppendClients, setNumStorageWriteApiStreams, setNumStreamingKeys, setStorageApiAppendThresholdBytes, setStorageApiAppendThresholdRecordCount, setStorageWriteApiMaxRequestSize, setStorageWriteApiMaxRetries, setStorageWriteApiTriggeringFrequencySec, setStorageWriteMaxInflightBytes, setStorageWriteMaxInflightRequests, setTempDatasetId, setUseStorageApiConnectionPool, setUseStorageWriteApi, setUseStorageWriteApiAtLeastOncegetEnableBucketReadMetricCounter, getEnableBucketWriteMetricCounter, getExecutorService, getGcsCustomAuditEntries, getGcsEndpoint, getGcsHttpRequestReadTimeout, getGcsHttpRequestWriteTimeout, getGcsPerformanceMetrics, getGcsReadCounterPrefix, getGcsRewriteDataOpBatchLimit, getGcsUploadBufferSizeBytes, getGcsUtil, getGcsWriteCounterPrefix, getGoogleCloudStorageReadOptions, getPathValidator, getPathValidatorClass, setEnableBucketReadMetricCounter, setEnableBucketWriteMetricCounter, setExecutorService, setGcsCustomAuditEntries, setGcsEndpoint, setGcsHttpRequestReadTimeout, setGcsHttpRequestWriteTimeout, setGcsPerformanceMetrics, setGcsReadCounterPrefix, setGcsRewriteDataOpBatchLimit, setGcsUploadBufferSizeBytes, setGcsUtil, setGcsWriteCounterPrefix, setGoogleCloudStorageReadOptions, setPathValidator, setPathValidatorClassgetDefaultWorkerLogLevel, getWorkerLogLevelOverrides, getWorkerSystemErrMessageLevel, getWorkerSystemOutMessageLevel, setDefaultWorkerLogLevel, setWorkerLogLevelOverrides, setWorkerSystemErrMessageLevel, setWorkerSystemOutMessageLevelgetActiveWorkRefreshPeriodMillis, getChannelzShowOnlyWindmillServiceChannels, getGlobalConfigRefreshPeriod, getIsWindmillServiceDirectPathEnabled, getLocalWindmillHostport, getMaxBundlesFromWindmillOutstanding, getMaxBytesFromWindmillOutstanding, getMaxStackTraceDepthToReport, getOverrideWindmillBinary, getPeriodicStatusPageOutputDirectory, getPerWorkerMetricsUpdateReportingPeriodMillis, getStreamingSideInputCacheExpirationMillis, getStreamingSideInputCacheMb, getStuckCommitDurationMillis, getUseSeparateWindmillHeartbeatStreams, getUseWindmillIsolatedChannels, getWindmillGetDataStreamCount, getWindmillHarnessUpdateReportingPeriod, getWindmillMessagesBetweenIsReadyChecks, getWindmillRequestBatchedGetWorkResponse, getWindmillServiceCommitThreads, getWindmillServiceEndpoint, getWindmillServicePort, getWindmillServiceRpcChannelAliveTimeoutSec, getWindmillServiceStreamingLogEveryNStreamFailures, getWindmillServiceStreamingRpcBatchLimit, getWindmillServiceStreamingRpcHealthCheckPeriodMs, getWindmillServiceStreamMaxBackoffMillis, setActiveWorkRefreshPeriodMillis, setChannelzShowOnlyWindmillServiceChannels, setGlobalConfigRefreshPeriod, setIsWindmillServiceDirectPathEnabled, setLocalWindmillHostport, setMaxBundlesFromWindmillOutstanding, setMaxBytesFromWindmillOutstanding, setMaxStackTraceDepthToReport, setOverrideWindmillBinary, setPeriodicStatusPageOutputDirectory, setPerWorkerMetricsUpdateReportingPeriodMillis, setStreamingSideInputCacheExpirationMillis, setStreamingSideInputCacheMb, setStuckCommitDurationMillis, setUseSeparateWindmillHeartbeatStreams, setUseWindmillIsolatedChannels, setWindmillGetDataStreamCount, setWindmillHarnessUpdateReportingPeriod, setWindmillMessagesBetweenIsReadyChecks, setWindmillRequestBatchedGetWorkResponse, setWindmillServiceCommitThreads, setWindmillServiceEndpoint, setWindmillServicePort, setWindmillServiceRpcChannelAliveTimeoutSec, setWindmillServiceStreamingLogEveryNStreamFailures, setWindmillServiceStreamingRpcBatchLimit, setWindmillServiceStreamingRpcHealthCheckPeriodMs, setWindmillServiceStreamMaxBackoffMillisgetProfilingAgentConfiguration, getSaveProfilesToGcs, setProfilingAgentConfiguration, setSaveProfilesToGcsgetPubsubRootUrl, setPubsubRootUrl, targetForRootUrlgetCredentialFactoryClass, getDataflowKmsKey, getGcpCredential, getGcpOauthScopes, getGcpTempLocation, getImpersonateServiceAccount, getWorkerRegion, getWorkerZone, getZone, isEnableStreamingEngine, setCredentialFactoryClass, setDataflowKmsKey, setEnableStreamingEngine, setGcpCredential, setGcpOauthScopes, setGcpTempLocation, setImpersonateServiceAccount, setWorkerRegion, setWorkerZone, setZonegetGoogleApiTrace, setGoogleApiTrace@Validation.Required @Default.InstanceFactory(value=GcpOptions.DefaultProjectFactory.class) java.lang.String getProject()
GcpOptionsgetProject in interface GcpOptionsvoid setProject(java.lang.String value)
setProject in interface GcpOptions@Default.InstanceFactory(value=DataflowPipelineOptions.StagingLocationFactory.class) java.lang.String getStagingLocation()
Must be a valid Cloud Storage URL, beginning with the prefix "gs://"
If getStagingLocation() is not set, it will default to GcpOptions.getGcpTempLocation(). GcpOptions.getGcpTempLocation() must be a valid GCS
 path.
void setStagingLocation(java.lang.String value)
boolean isUpdate()
void setUpdate(boolean value)
java.lang.String getCreateFromSnapshot()
void setCreateFromSnapshot(java.lang.String value)
java.lang.String getTemplateLocation()
void setTemplateLocation(java.lang.String value)
Example:
 DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
 options.setTemplateLocation("gs://your-bucket/templates/my-template");
 value - Cloud Storage path for storing the Dataflow template.java.util.List<java.lang.String> getDataflowServiceOptions()
void setDataflowServiceOptions(java.util.List<java.lang.String> options)
java.lang.String getServiceAccount()
void setServiceAccount(java.lang.String value)
@Default.InstanceFactory(value=DefaultGcpRegionFactory.class) java.lang.String getRegion()
void setRegion(java.lang.String region)
@Default.String(value="") java.lang.String getDataflowEndpoint()
Defaults to the current version of the Google Cloud Dataflow API, at the time the current SDK version was released.
If the string contains "://", then this is treated as a URL, otherwise DataflowPipelineDebugOptions.getApiRootUrl() is used as the root URL.
getDataflowEndpoint in interface DataflowPipelineDebugOptionsvoid setDataflowEndpoint(java.lang.String value)
setDataflowEndpoint in interface DataflowPipelineDebugOptionsjava.util.Map<java.lang.String,java.lang.String> getLabels()
void setLabels(java.util.Map<java.lang.String,java.lang.String> labels)
java.lang.String getPipelineUrl()
void setPipelineUrl(java.lang.String urlString)
java.lang.String getDataflowWorkerJar()
void setDataflowWorkerJar(java.lang.String dataflowWorkerJar)
@Default.Enum(value="UNSPECIFIED") DataflowPipelineOptions.FlexResourceSchedulingGoal getFlexRSGoal()
void setFlexRSGoal(DataflowPipelineOptions.FlexResourceSchedulingGoal goal)
boolean isHotKeyLoggingEnabled()
void setHotKeyLoggingEnabled(boolean value)
java.util.List<java.lang.String> getJdkAddOpenModules()
With JDK 16+, JDK internals are strongly encapsulated and can result in an InaccessibleObjectException being thrown if a tool or library uses reflection that access JDK internals. If you see these errors in your worker logs, you can pass in modules to open using the format module/package=target-module(,target-module)* to allow access to the library. E.g. java.base/java.lang=jamm
You may see warnings that jamm, a library used to more accurately size objects, is unable to make a private field accessible. To resolve the warning, open the specified module/package to jamm.
void setJdkAddOpenModules(java.util.List<java.lang.String> options)