Interface DataflowPipelineOptions
- All Superinterfaces:
ApplicationNameOptions,BigQueryOptions,DataflowPipelineDebugOptions,DataflowPipelineWorkerPoolOptions,DataflowProfilingOptions,DataflowStreamingPipelineOptions,DataflowWorkerLoggingOptions,ExperimentalOptions,FileStagingOptions,GcpOptions,GcsOptions,GoogleApiDebugOptions,HasDisplayData,MemoryMonitorOptions,PipelineOptions,PubsubOptions,StreamingOptions
- All Known Subinterfaces:
DataflowWorkerHarnessOptions,TestDataflowPipelineOptions
DataflowRunner.-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enumSet of available Flexible Resource Scheduling goals.static classReturns a default staging location underGcpOptions.getGcpTempLocation().Nested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions
DataflowPipelineDebugOptions.DataflowClientFactory, DataflowPipelineDebugOptions.StagerFactory, DataflowPipelineDebugOptions.UnboundedReaderMaxReadTimeFactoryNested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions
DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmTypeNested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowProfilingOptions
DataflowProfilingOptions.DataflowProfilingAgentConfigurationNested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowStreamingPipelineOptions
DataflowStreamingPipelineOptions.EnableWindmillServiceDirectPathFactory, DataflowStreamingPipelineOptions.GlobalConfigRefreshPeriodFactory, DataflowStreamingPipelineOptions.HarnessUpdateReportingPeriodFactory, DataflowStreamingPipelineOptions.LocalWindmillHostportFactory, DataflowStreamingPipelineOptions.MaxStackTraceDepthToReportFactory, DataflowStreamingPipelineOptions.PeriodicStatusPageDirectoryFactory, DataflowStreamingPipelineOptions.WindmillServiceStreamingRpcBatchLimitFactoryNested classes/interfaces inherited from interface org.apache.beam.runners.dataflow.options.DataflowWorkerLoggingOptions
DataflowWorkerLoggingOptions.Level, DataflowWorkerLoggingOptions.WorkerLogLevelOverridesNested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
GcpOptions.DefaultProjectFactory, GcpOptions.EnableStreamingEngineFactory, GcpOptions.GcpOAuthScopesFactory, GcpOptions.GcpTempLocationFactory, GcpOptions.GcpUserCredentialsFactoryNested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcsOptions
GcsOptions.ExecutorServiceFactory, GcsOptions.GcsCustomAuditEntries, GcsOptions.PathValidatorFactoryNested classes/interfaces inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
GoogleApiDebugOptions.GoogleApiTracerNested classes/interfaces inherited from interface org.apache.beam.sdk.options.PipelineOptions
PipelineOptions.AtomicLongFactory, PipelineOptions.CheckEnabled, PipelineOptions.DirectRunner, PipelineOptions.JobNameFactory, PipelineOptions.UserAgentFactory -
Field Summary
Fields inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
STATE_CACHE_SIZE, STATE_SAMPLING_PERIOD_MILLISFields inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
STREAMING_ENGINE_EXPERIMENT, WINDMILL_SERVICE_EXPERIMENT -
Method Summary
Modifier and TypeMethodDescriptionIf set, the snapshot from which the job should be created.Dataflow endpoint to use.Service options are set by the user and configure the service.This option controls Flexible Resource Scheduling mode.Open modules needed for reflection that access JDK internals with Java 9+Labels that will be applied to the billing records for this job.The URL of the staged portable pipeline.Project id to use when launching jobs.The Google Compute Engine region for creating Dataflow jobs.Run the job as a specific service account, instead of the default GCE robot.GCS path for staging local files, e.g.Where the runner should generate a template file.booleanIf enabled then the literal key will be logged to Cloud Logging if a hot key is detected.booleanisUpdate()Whether to update the currently running pipeline with the same name as this one.voidsetCreateFromSnapshot(String value) voidsetDataflowEndpoint(String value) voidsetDataflowServiceOptions(List<String> options) voidsetDataflowWorkerJar(String dataflowWorkerJar) voidvoidsetHotKeyLoggingEnabled(boolean value) voidsetJdkAddOpenModules(List<String> options) voidvoidsetPipelineUrl(String urlString) voidsetProject(String value) voidvoidsetServiceAccount(String value) voidsetStagingLocation(String value) voidsetTemplateLocation(String value) Sets the Cloud Storage path where the Dataflow template will be stored.voidsetUpdate(boolean value) Methods inherited from interface org.apache.beam.sdk.options.ApplicationNameOptions
getAppName, setAppNameMethods inherited from interface org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions
getBigQueryEndpoint, getBigQueryProject, getBqStreamingApiLoggingFrequencySec, getEnableStorageReadApiV2, getGroupFilesFileLoad, getHTTPReadTimeout, getHTTPWriteTimeout, getInsertBundleParallelism, getJobLabelsMap, getMaxBufferingDurationMilliSec, getMaxConnectionPoolConnections, getMaxStreamingBatchSize, getMaxStreamingRowsToBatch, getMinConnectionPoolConnections, getNumStorageWriteApiStreamAppendClients, getNumStorageWriteApiStreams, getNumStreamingKeys, getStorageApiAppendThresholdBytes, getStorageApiAppendThresholdRecordCount, getStorageWriteApiMaxRequestSize, getStorageWriteApiMaxRetries, getStorageWriteApiTriggeringFrequencySec, getStorageWriteMaxInflightBytes, getStorageWriteMaxInflightRequests, getTempDatasetId, getUseStorageApiConnectionPool, getUseStorageWriteApi, getUseStorageWriteApiAtLeastOnce, setBigQueryEndpoint, setBigQueryProject, setBqStreamingApiLoggingFrequencySec, setEnableStorageReadApiV2, setGroupFilesFileLoad, setHTTPReadTimeout, setHTTPWriteTimeout, setInsertBundleParallelism, setJobLabelsMap, setMaxBufferingDurationMilliSec, setMaxConnectionPoolConnections, setMaxStreamingBatchSize, setMaxStreamingRowsToBatch, setMinConnectionPoolConnections, setNumStorageWriteApiStreamAppendClients, setNumStorageWriteApiStreams, setNumStreamingKeys, setStorageApiAppendThresholdBytes, setStorageApiAppendThresholdRecordCount, setStorageWriteApiMaxRequestSize, setStorageWriteApiMaxRetries, setStorageWriteApiTriggeringFrequencySec, setStorageWriteMaxInflightBytes, setStorageWriteMaxInflightRequests, setTempDatasetId, setUseStorageApiConnectionPool, setUseStorageWriteApi, setUseStorageWriteApiAtLeastOnceMethods inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineDebugOptions
getApiRootUrl, getDataflowClient, getDataflowJobFile, getDesiredNumUnboundedSourceSplits, getDumpHeapOnOOM, getJfrRecordingDurationSec, getNumberOfWorkerHarnessThreads, getReaderCacheTimeoutSec, getRecordJfrOnGcThrashing, getSaveHeapDumpsToGcsPath, getSdkHarnessContainerImageOverrides, getStager, getStagerClass, getTransformNameMapping, getUnboundedReaderMaxElements, getUnboundedReaderMaxReadTimeMs, getUnboundedReaderMaxReadTimeSec, getUnboundedReaderMaxWaitForElementsMs, getWorkerCacheMb, setApiRootUrl, setDataflowClient, setDataflowJobFile, setDesiredNumUnboundedSourceSplits, setDumpHeapOnOOM, setJfrRecordingDurationSec, setNumberOfWorkerHarnessThreads, setReaderCacheTimeoutSec, setRecordJfrOnGcThrashing, setSaveHeapDumpsToGcsPath, setSdkHarnessContainerImageOverrides, setStager, setStagerClass, setTransformNameMapping, setUnboundedReaderMaxElements, setUnboundedReaderMaxReadTimeMs, setUnboundedReaderMaxReadTimeSec, setUnboundedReaderMaxWaitForElementsMs, setWorkerCacheMbMethods inherited from interface org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions
getAutoscalingAlgorithm, getDiskSizeGb, getMaxNumWorkers, getMinCpuPlatform, getNetwork, getNumWorkers, getSdkContainerImage, getSubnetwork, getUsePublicIps, getWorkerDiskType, getWorkerHarnessContainerImage, getWorkerMachineType, setAutoscalingAlgorithm, setDiskSizeGb, setMaxNumWorkers, setMinCpuPlatform, setNetwork, setNumWorkers, setSdkContainerImage, setSubnetwork, setUsePublicIps, setWorkerDiskType, setWorkerHarnessContainerImage, setWorkerMachineTypeMethods inherited from interface org.apache.beam.runners.dataflow.options.DataflowProfilingOptions
getProfilingAgentConfiguration, getSaveProfilesToGcs, setProfilingAgentConfiguration, setSaveProfilesToGcsMethods inherited from interface org.apache.beam.runners.dataflow.options.DataflowStreamingPipelineOptions
getActiveWorkRefreshPeriodMillis, getChannelzShowOnlyWindmillServiceChannels, getGlobalConfigRefreshPeriod, getIsWindmillServiceDirectPathEnabled, getLocalWindmillHostport, getMaxBundlesFromWindmillOutstanding, getMaxBytesFromWindmillOutstanding, getMaxStackTraceDepthToReport, getOverrideWindmillBinary, getPeriodicStatusPageOutputDirectory, getPerWorkerMetricsUpdateReportingPeriodMillis, getStreamingSideInputCacheExpirationMillis, getStreamingSideInputCacheMb, getStuckCommitDurationMillis, getUseSeparateWindmillHeartbeatStreams, getUseWindmillIsolatedChannels, getWindmillGetDataStreamCount, getWindmillHarnessUpdateReportingPeriod, getWindmillMessagesBetweenIsReadyChecks, getWindmillRequestBatchedGetWorkResponse, getWindmillServiceCommitThreads, getWindmillServiceEndpoint, getWindmillServicePort, getWindmillServiceRpcChannelAliveTimeoutSec, getWindmillServiceStreamingLogEveryNStreamFailures, getWindmillServiceStreamingRpcBatchLimit, getWindmillServiceStreamingRpcHealthCheckPeriodMs, getWindmillServiceStreamMaxBackoffMillis, setActiveWorkRefreshPeriodMillis, setChannelzShowOnlyWindmillServiceChannels, setGlobalConfigRefreshPeriod, setIsWindmillServiceDirectPathEnabled, setLocalWindmillHostport, setMaxBundlesFromWindmillOutstanding, setMaxBytesFromWindmillOutstanding, setMaxStackTraceDepthToReport, setOverrideWindmillBinary, setPeriodicStatusPageOutputDirectory, setPerWorkerMetricsUpdateReportingPeriodMillis, setStreamingSideInputCacheExpirationMillis, setStreamingSideInputCacheMb, setStuckCommitDurationMillis, setUseSeparateWindmillHeartbeatStreams, setUseWindmillIsolatedChannels, setWindmillGetDataStreamCount, setWindmillHarnessUpdateReportingPeriod, setWindmillMessagesBetweenIsReadyChecks, setWindmillRequestBatchedGetWorkResponse, setWindmillServiceCommitThreads, setWindmillServiceEndpoint, setWindmillServicePort, setWindmillServiceRpcChannelAliveTimeoutSec, setWindmillServiceStreamingLogEveryNStreamFailures, setWindmillServiceStreamingRpcBatchLimit, setWindmillServiceStreamingRpcHealthCheckPeriodMs, setWindmillServiceStreamMaxBackoffMillisMethods inherited from interface org.apache.beam.runners.dataflow.options.DataflowWorkerLoggingOptions
getDefaultWorkerLogLevel, getWorkerLogLevelOverrides, getWorkerSystemErrMessageLevel, getWorkerSystemOutMessageLevel, setDefaultWorkerLogLevel, setWorkerLogLevelOverrides, setWorkerSystemErrMessageLevel, setWorkerSystemOutMessageLevelMethods inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
getExperiments, setExperimentsMethods inherited from interface org.apache.beam.sdk.options.FileStagingOptions
getFilesToStage, setFilesToStageMethods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcpOptions
getCredentialFactoryClass, getDataflowKmsKey, getGcpCredential, getGcpOauthScopes, getGcpTempLocation, getImpersonateServiceAccount, getWorkerRegion, getWorkerZone, getZone, isEnableStreamingEngine, setCredentialFactoryClass, setDataflowKmsKey, setEnableStreamingEngine, setGcpCredential, setGcpOauthScopes, setGcpTempLocation, setImpersonateServiceAccount, setWorkerRegion, setWorkerZone, setZoneMethods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GcsOptions
getEnableBucketReadMetricCounter, getEnableBucketWriteMetricCounter, getExecutorService, getGcsCustomAuditEntries, getGcsEndpoint, getGcsHttpRequestReadTimeout, getGcsHttpRequestWriteTimeout, getGcsPerformanceMetrics, getGcsReadCounterPrefix, getGcsRewriteDataOpBatchLimit, getGcsUploadBufferSizeBytes, getGcsUtil, getGcsWriteCounterPrefix, getGoogleCloudStorageReadOptions, getPathValidator, getPathValidatorClass, setEnableBucketReadMetricCounter, setEnableBucketWriteMetricCounter, setExecutorService, setGcsCustomAuditEntries, setGcsEndpoint, setGcsHttpRequestReadTimeout, setGcsHttpRequestWriteTimeout, setGcsPerformanceMetrics, setGcsReadCounterPrefix, setGcsRewriteDataOpBatchLimit, setGcsUploadBufferSizeBytes, setGcsUtil, setGcsWriteCounterPrefix, setGoogleCloudStorageReadOptions, setPathValidator, setPathValidatorClassMethods inherited from interface org.apache.beam.sdk.extensions.gcp.options.GoogleApiDebugOptions
getGoogleApiTrace, setGoogleApiTraceMethods inherited from interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayDataMethods inherited from interface org.apache.beam.sdk.options.MemoryMonitorOptions
getGCThrashingPercentagePerPeriod, getGzipCompressHeapDumps, getRemoteHeapDumpLocation, setGCThrashingPercentagePerPeriod, setGzipCompressHeapDumps, setRemoteHeapDumpLocationMethods inherited from interface org.apache.beam.sdk.options.PipelineOptions
as, getGbek, getJobName, getOptionsId, getRunner, getStableUniqueNames, getTempLocation, getUserAgent, outputRuntimeOptions, revision, setGbek, setJobName, setOptionsId, setRunner, setStableUniqueNames, setTempLocation, setUserAgentMethods inherited from interface org.apache.beam.sdk.io.gcp.pubsub.PubsubOptions
getPubsubRootUrl, setPubsubRootUrlMethods inherited from interface org.apache.beam.sdk.options.StreamingOptions
getUpdateCompatibilityVersion, isStreaming, setStreaming, setUpdateCompatibilityVersion
-
Method Details
-
getProject
Description copied from interface:GcpOptionsProject id to use when launching jobs.- Specified by:
getProjectin interfaceGcpOptions
-
setProject
- Specified by:
setProjectin interfaceGcpOptions
-
getStagingLocation
GCS path for staging local files, e.g. gs://bucket/objectMust be a valid Cloud Storage URL, beginning with the prefix "gs://"
If
getStagingLocation()is not set, it will default toGcpOptions.getGcpTempLocation().GcpOptions.getGcpTempLocation()must be a valid GCS path. -
setStagingLocation
-
isUpdate
boolean isUpdate()Whether to update the currently running pipeline with the same name as this one. -
setUpdate
void setUpdate(boolean value) -
getCreateFromSnapshot
String getCreateFromSnapshot()If set, the snapshot from which the job should be created. -
setCreateFromSnapshot
-
getTemplateLocation
String getTemplateLocation()Where the runner should generate a template file. Must either be local or Cloud Storage. -
setTemplateLocation
Sets the Cloud Storage path where the Dataflow template will be stored. Required for creating Flex Templates or Classic Templates.Example:
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class); options.setTemplateLocation("gs://your-bucket/templates/my-template");- Parameters:
value- Cloud Storage path for storing the Dataflow template.
-
getDataflowServiceOptions
Service options are set by the user and configure the service. This decouples service side feature availability from the Apache Beam release cycle. -
setDataflowServiceOptions
-
getServiceAccount
String getServiceAccount()Run the job as a specific service account, instead of the default GCE robot. -
setServiceAccount
-
getRegion
The Google Compute Engine region for creating Dataflow jobs. -
setRegion
-
getDataflowEndpoint
Dataflow endpoint to use.Defaults to the current version of the Google Cloud Dataflow API, at the time the current SDK version was released.
If the string contains "://", then this is treated as a URL, otherwise
DataflowPipelineDebugOptions.getApiRootUrl()is used as the root URL.- Specified by:
getDataflowEndpointin interfaceDataflowPipelineDebugOptions
-
setDataflowEndpoint
- Specified by:
setDataflowEndpointin interfaceDataflowPipelineDebugOptions
-
getLabels
Labels that will be applied to the billing records for this job. -
setLabels
-
getPipelineUrl
String getPipelineUrl()The URL of the staged portable pipeline. -
setPipelineUrl
-
getDataflowWorkerJar
String getDataflowWorkerJar() -
setDataflowWorkerJar
-
getFlexRSGoal
This option controls Flexible Resource Scheduling mode. -
setFlexRSGoal
-
isHotKeyLoggingEnabled
boolean isHotKeyLoggingEnabled()If enabled then the literal key will be logged to Cloud Logging if a hot key is detected. -
setHotKeyLoggingEnabled
void setHotKeyLoggingEnabled(boolean value) -
getJdkAddOpenModules
Open modules needed for reflection that access JDK internals with Java 9+With JDK 16+, JDK internals are strongly encapsulated and can result in an InaccessibleObjectException being thrown if a tool or library uses reflection that access JDK internals. If you see these errors in your worker logs, you can pass in modules to open using the format module/package=target-module(,target-module)* to allow access to the library. E.g. java.base/java.lang=jamm
You may see warnings that jamm, a library used to more accurately size objects, is unable to make a private field accessible. To resolve the warning, open the specified module/package to jamm.
-
setJdkAddOpenModules
-