public interface DataflowPipelineOptions extends PipelineOptions, GcpOptions, ApplicationNameOptions, DataflowPipelineDebugOptions, DataflowPipelineWorkerPoolOptions, BigQueryOptions, GcsOptions, StreamingOptions, DataflowWorkerLoggingOptions, DataflowStreamingPipelineOptions, DataflowProfilingOptions, PubsubOptions
DataflowRunner
.Modifier and Type | Interface and Description |
---|---|
static class |
DataflowPipelineOptions.FlexResourceSchedulingGoal
Set of available Flexible Resource Scheduling goals.
|
static class |
DataflowPipelineOptions.StagingLocationFactory
Returns a default staging location under
GcpOptions.getGcpTempLocation() . |
DataflowPipelineDebugOptions.DataflowClientFactory, DataflowPipelineDebugOptions.StagerFactory, DataflowPipelineDebugOptions.UnboundedReaderMaxReadTimeFactory
DataflowPipelineWorkerPoolOptions.AutoscalingAlgorithmType
GcsOptions.ExecutorServiceFactory, GcsOptions.PathValidatorFactory
DataflowWorkerLoggingOptions.Level, DataflowWorkerLoggingOptions.WorkerLogLevelOverrides
DataflowStreamingPipelineOptions.GlobalConfigRefreshPeriodFactory, DataflowStreamingPipelineOptions.HarnessUpdateReportingPeriodFactory, DataflowStreamingPipelineOptions.LocalWindmillHostportFactory, DataflowStreamingPipelineOptions.MaxStackTraceDepthToReportFactory, DataflowStreamingPipelineOptions.PeriodicStatusPageDirectoryFactory, DataflowStreamingPipelineOptions.WindmillServiceStreamingRpcBatchLimitFactory
DataflowProfilingOptions.DataflowProfilingAgentConfiguration
GcpOptions.DefaultProjectFactory, GcpOptions.EnableStreamingEngineFactory, GcpOptions.GcpOAuthScopesFactory, GcpOptions.GcpTempLocationFactory, GcpOptions.GcpUserCredentialsFactory
GoogleApiDebugOptions.GoogleApiTracer
STATE_CACHE_SIZE, STATE_SAMPLING_PERIOD_MILLIS
STREAMING_ENGINE_EXPERIMENT, WINDMILL_SERVICE_EXPERIMENT
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getCreateFromSnapshot()
If set, the snapshot from which the job should be created.
|
java.lang.String |
getDataflowEndpoint()
Dataflow endpoint to use.
|
java.util.List<java.lang.String> |
getDataflowServiceOptions()
Service options are set by the user and configure the service.
|
java.lang.String |
getDataflowWorkerJar() |
DataflowPipelineOptions.FlexResourceSchedulingGoal |
getFlexRSGoal()
This option controls Flexible Resource Scheduling mode.
|
java.util.List<java.lang.String> |
getJdkAddOpenModules()
Open modules needed for reflection that access JDK internals with Java 9+
|
java.util.Map<java.lang.String,java.lang.String> |
getLabels()
Labels that will be applied to the billing records for this job.
|
java.lang.String |
getPipelineUrl()
The URL of the staged portable pipeline.
|
java.lang.String |
getProject()
Project id to use when launching jobs.
|
java.lang.String |
getRegion()
The Google Compute Engine region for
creating Dataflow jobs.
|
java.lang.String |
getServiceAccount()
Run the job as a specific service account, instead of the default GCE robot.
|
java.lang.String |
getStagingLocation()
GCS path for staging local files, e.g.
|
java.lang.String |
getTemplateLocation()
Where the runner should generate a template file.
|
boolean |
isHotKeyLoggingEnabled()
If enabled then the literal key will be logged to Cloud Logging if a hot key is detected.
|
boolean |
isUpdate()
Whether to update the currently running pipeline with the same name as this one.
|
void |
setCreateFromSnapshot(java.lang.String value) |
void |
setDataflowEndpoint(java.lang.String value) |
void |
setDataflowServiceOptions(java.util.List<java.lang.String> options) |
void |
setDataflowWorkerJar(java.lang.String dataflowWorkerJar) |
void |
setFlexRSGoal(DataflowPipelineOptions.FlexResourceSchedulingGoal goal) |
void |
setHotKeyLoggingEnabled(boolean value) |
void |
setJdkAddOpenModules(java.util.List<java.lang.String> options) |
void |
setLabels(java.util.Map<java.lang.String,java.lang.String> labels) |
void |
setPipelineUrl(java.lang.String urlString) |
void |
setProject(java.lang.String value) |
void |
setRegion(java.lang.String region) |
void |
setServiceAccount(java.lang.String value) |
void |
setStagingLocation(java.lang.String value) |
void |
setTemplateLocation(java.lang.String value) |
void |
setUpdate(boolean value) |
getApiRootUrl, getDataflowClient, getDataflowJobFile, getDesiredNumUnboundedSourceSplits, getDumpHeapOnOOM, getJfrRecordingDurationSec, getNumberOfWorkerHarnessThreads, getReaderCacheTimeoutSec, getRecordJfrOnGcThrashing, getSaveHeapDumpsToGcsPath, getSdkHarnessContainerImageOverrides, getStager, getStagerClass, getTransformNameMapping, getUnboundedReaderMaxElements, getUnboundedReaderMaxReadTimeMs, getUnboundedReaderMaxReadTimeSec, getUnboundedReaderMaxWaitForElementsMs, getWorkerCacheMb, setApiRootUrl, setDataflowClient, setDataflowJobFile, setDesiredNumUnboundedSourceSplits, setDumpHeapOnOOM, setJfrRecordingDurationSec, setNumberOfWorkerHarnessThreads, setReaderCacheTimeoutSec, setRecordJfrOnGcThrashing, setSaveHeapDumpsToGcsPath, setSdkHarnessContainerImageOverrides, setStager, setStagerClass, setTransformNameMapping, setUnboundedReaderMaxElements, setUnboundedReaderMaxReadTimeMs, setUnboundedReaderMaxReadTimeSec, setUnboundedReaderMaxWaitForElementsMs, setWorkerCacheMb
addExperiment, getExperiments, getExperimentValue, hasExperiment, setExperiments
getGCThrashingPercentagePerPeriod, setGCThrashingPercentagePerPeriod
getAutoscalingAlgorithm, getDiskSizeGb, getMaxNumWorkers, getMinCpuPlatform, getNetwork, getNumWorkers, getSdkContainerImage, getSubnetwork, getUsePublicIps, getWorkerDiskType, getWorkerHarnessContainerImage, getWorkerMachineType, setAutoscalingAlgorithm, setDiskSizeGb, setMaxNumWorkers, setMinCpuPlatform, setNetwork, setNumWorkers, setSdkContainerImage, setSubnetwork, setUsePublicIps, setWorkerDiskType, setWorkerHarnessContainerImage, setWorkerMachineType
getFilesToStage, setFilesToStage
getBigQueryEndpoint, getBigQueryProject, getBqStreamingApiLoggingFrequencySec, getEnableStorageReadApiV2, getHTTPReadTimeout, getHTTPWriteTimeout, getInsertBundleParallelism, getMaxBufferingDurationMilliSec, getMaxConnectionPoolConnections, getMaxStreamingBatchSize, getMaxStreamingRowsToBatch, getMinConnectionPoolConnections, getNumStorageWriteApiStreamAppendClients, getNumStorageWriteApiStreams, getNumStreamingKeys, getStorageApiAppendThresholdBytes, getStorageApiAppendThresholdRecordCount, getStorageWriteApiMaxRequestSize, getStorageWriteApiMaxRetries, getStorageWriteApiTriggeringFrequencySec, getStorageWriteMaxInflightBytes, getStorageWriteMaxInflightRequests, getTempDatasetId, getUseStorageApiConnectionPool, getUseStorageWriteApi, getUseStorageWriteApiAtLeastOnce, setBigQueryEndpoint, setBigQueryProject, setBqStreamingApiLoggingFrequencySec, setEnableStorageReadApiV2, setHTTPReadTimeout, setHTTPWriteTimeout, setInsertBundleParallelism, setMaxBufferingDurationMilliSec, setMaxConnectionPoolConnections, setMaxStreamingBatchSize, setMaxStreamingRowsToBatch, setMinConnectionPoolConnections, setNumStorageWriteApiStreamAppendClients, setNumStorageWriteApiStreams, setNumStreamingKeys, setStorageApiAppendThresholdBytes, setStorageApiAppendThresholdRecordCount, setStorageWriteApiMaxRequestSize, setStorageWriteApiMaxRetries, setStorageWriteApiTriggeringFrequencySec, setStorageWriteMaxInflightBytes, setStorageWriteMaxInflightRequests, setTempDatasetId, setUseStorageApiConnectionPool, setUseStorageWriteApi, setUseStorageWriteApiAtLeastOnce
getEnableBucketReadMetricCounter, getEnableBucketWriteMetricCounter, getExecutorService, getGcsEndpoint, getGcsHttpRequestReadTimeout, getGcsHttpRequestWriteTimeout, getGcsPerformanceMetrics, getGcsReadCounterPrefix, getGcsRewriteDataOpBatchLimit, getGcsUploadBufferSizeBytes, getGcsUtil, getGcsWriteCounterPrefix, getPathValidator, getPathValidatorClass, setEnableBucketReadMetricCounter, setEnableBucketWriteMetricCounter, setExecutorService, setGcsEndpoint, setGcsHttpRequestReadTimeout, setGcsHttpRequestWriteTimeout, setGcsPerformanceMetrics, setGcsReadCounterPrefix, setGcsRewriteDataOpBatchLimit, setGcsUploadBufferSizeBytes, setGcsUtil, setGcsWriteCounterPrefix, setPathValidator, setPathValidatorClass
getDefaultWorkerLogLevel, getWorkerLogLevelOverrides, getWorkerSystemErrMessageLevel, getWorkerSystemOutMessageLevel, setDefaultWorkerLogLevel, setWorkerLogLevelOverrides, setWorkerSystemErrMessageLevel, setWorkerSystemOutMessageLevel
getActiveWorkRefreshPeriodMillis, getChannelzShowOnlyWindmillServiceChannels, getGlobalConfigRefreshPeriod, getIsWindmillServiceDirectPathEnabled, getLocalWindmillHostport, getMaxBundlesFromWindmillOutstanding, getMaxBytesFromWindmillOutstanding, getMaxStackTraceDepthToReport, getOverrideWindmillBinary, getPeriodicStatusPageOutputDirectory, getPerWorkerMetricsUpdateReportingPeriodMillis, getStreamingSideInputCacheExpirationMillis, getStreamingSideInputCacheMb, getStuckCommitDurationMillis, getUseSeparateWindmillHeartbeatStreams, getUseWindmillIsolatedChannels, getWindmillGetDataStreamCount, getWindmillHarnessUpdateReportingPeriod, getWindmillMessagesBetweenIsReadyChecks, getWindmillServiceCommitThreads, getWindmillServiceEndpoint, getWindmillServicePort, getWindmillServiceRpcChannelAliveTimeoutSec, getWindmillServiceStreamingLogEveryNStreamFailures, getWindmillServiceStreamingRpcBatchLimit, getWindmillServiceStreamingRpcHealthCheckPeriodMs, getWindmillServiceStreamMaxBackoffMillis, setActiveWorkRefreshPeriodMillis, setChannelzShowOnlyWindmillServiceChannels, setGlobalConfigRefreshPeriod, setIsWindmillServiceDirectPathEnabled, setLocalWindmillHostport, setMaxBundlesFromWindmillOutstanding, setMaxBytesFromWindmillOutstanding, setMaxStackTraceDepthToReport, setOverrideWindmillBinary, setPeriodicStatusPageOutputDirectory, setPerWorkerMetricsUpdateReportingPeriodMillis, setStreamingSideInputCacheExpirationMillis, setStreamingSideInputCacheMb, setStuckCommitDurationMillis, setUseSeparateWindmillHeartbeatStreams, setUseWindmillIsolatedChannels, setWindmillGetDataStreamCount, setWindmillHarnessUpdateReportingPeriod, setWindmillMessagesBetweenIsReadyChecks, setWindmillServiceCommitThreads, setWindmillServiceEndpoint, setWindmillServicePort, setWindmillServiceRpcChannelAliveTimeoutSec, setWindmillServiceStreamingLogEveryNStreamFailures, setWindmillServiceStreamingRpcBatchLimit, setWindmillServiceStreamingRpcHealthCheckPeriodMs, setWindmillServiceStreamMaxBackoffMillis
getProfilingAgentConfiguration, getSaveProfilesToGcs, setProfilingAgentConfiguration, setSaveProfilesToGcs
getPubsubRootUrl, setPubsubRootUrl, targetForRootUrl
getCredentialFactoryClass, getDataflowKmsKey, getGcpCredential, getGcpOauthScopes, getGcpTempLocation, getImpersonateServiceAccount, getWorkerRegion, getWorkerZone, getZone, isEnableStreamingEngine, setCredentialFactoryClass, setDataflowKmsKey, setEnableStreamingEngine, setGcpCredential, setGcpOauthScopes, setGcpTempLocation, setImpersonateServiceAccount, setWorkerRegion, setWorkerZone, setZone
getGoogleApiTrace, setGoogleApiTrace
@Validation.Required @Default.InstanceFactory(value=GcpOptions.DefaultProjectFactory.class) java.lang.String getProject()
GcpOptions
getProject
in interface GcpOptions
void setProject(java.lang.String value)
setProject
in interface GcpOptions
@Default.InstanceFactory(value=DataflowPipelineOptions.StagingLocationFactory.class) java.lang.String getStagingLocation()
Must be a valid Cloud Storage URL, beginning with the prefix "gs://"
If getStagingLocation()
is not set, it will default to GcpOptions.getGcpTempLocation()
. GcpOptions.getGcpTempLocation()
must be a valid GCS
path.
void setStagingLocation(java.lang.String value)
boolean isUpdate()
void setUpdate(boolean value)
java.lang.String getCreateFromSnapshot()
void setCreateFromSnapshot(java.lang.String value)
java.lang.String getTemplateLocation()
void setTemplateLocation(java.lang.String value)
java.util.List<java.lang.String> getDataflowServiceOptions()
void setDataflowServiceOptions(java.util.List<java.lang.String> options)
java.lang.String getServiceAccount()
void setServiceAccount(java.lang.String value)
@Default.InstanceFactory(value=DefaultGcpRegionFactory.class) java.lang.String getRegion()
void setRegion(java.lang.String region)
@Default.String(value="") java.lang.String getDataflowEndpoint()
Defaults to the current version of the Google Cloud Dataflow API, at the time the current SDK version was released.
If the string contains "://", then this is treated as a URL, otherwise DataflowPipelineDebugOptions.getApiRootUrl()
is used as the root URL.
getDataflowEndpoint
in interface DataflowPipelineDebugOptions
void setDataflowEndpoint(java.lang.String value)
setDataflowEndpoint
in interface DataflowPipelineDebugOptions
java.util.Map<java.lang.String,java.lang.String> getLabels()
void setLabels(java.util.Map<java.lang.String,java.lang.String> labels)
java.lang.String getPipelineUrl()
void setPipelineUrl(java.lang.String urlString)
java.lang.String getDataflowWorkerJar()
void setDataflowWorkerJar(java.lang.String dataflowWorkerJar)
@Default.Enum(value="UNSPECIFIED") DataflowPipelineOptions.FlexResourceSchedulingGoal getFlexRSGoal()
void setFlexRSGoal(DataflowPipelineOptions.FlexResourceSchedulingGoal goal)
boolean isHotKeyLoggingEnabled()
void setHotKeyLoggingEnabled(boolean value)
java.util.List<java.lang.String> getJdkAddOpenModules()
With JDK 16+, JDK internals are strongly encapsulated and can result in an InaccessibleObjectException being thrown if a tool or library uses reflection that access JDK internals. If you see these errors in your worker logs, you can pass in modules to open using the format module/package=target-module(,target-module)* to allow access to the library. E.g. java.base/java.lang=jamm
You may see warnings that jamm, a library used to more accurately size objects, is unable to make a private field accessible. To resolve the warning, open the specified module/package to jamm.
void setJdkAddOpenModules(java.util.List<java.lang.String> options)