Interface DataflowPipelineDebugOptions
- All Superinterfaces:
ExperimentalOptions
,HasDisplayData
,MemoryMonitorOptions
,PipelineOptions
- All Known Subinterfaces:
DataflowPipelineOptions
,DataflowWorkerHarnessOptions
,TestDataflowPipelineOptions
@Hidden
public interface DataflowPipelineDebugOptions
extends ExperimentalOptions, MemoryMonitorOptions, PipelineOptions
Internal. Options used to control execution of the Dataflow SDK for debugging and testing
purposes.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic class
Returns the default Dataflow client built from the passed in PipelineOptions.static class
Creates aStager
object using the class specified ingetStagerClass()
.static final class
Sets Integer value based on old, deprecated field (getUnboundedReaderMaxReadTimeSec()
).Nested classes/interfaces inherited from interface org.apache.beam.sdk.options.PipelineOptions
PipelineOptions.AtomicLongFactory, PipelineOptions.CheckEnabled, PipelineOptions.DirectRunner, PipelineOptions.JobNameFactory, PipelineOptions.UserAgentFactory
-
Field Summary
Fields inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
STATE_CACHE_SIZE, STATE_SAMPLING_PERIOD_MILLIS
-
Method Summary
Modifier and TypeMethodDescriptionThe root URL for the Dataflow API.An instance of the Dataflow client.Dataflow endpoint to use.The path to write the translated Dataflow job specification out to at job submission time.int
The desired number of initial splits for UnboundedSources.boolean
If true, save a heap dump before killing a thread or process which is GC thrashing or out of memory.int
int
Number of threads to use on the Dataflow worker harness.The amount of time before UnboundedReaders are considered idle and closed during streaming execution.boolean
If true, save a JFR profile when GC thrashing is first detected.CAUTION: This option implies dumpHeapOnOOM, and has similar caveats.Overrides for SDK harness container images.The resource stager instance that should be used to stage resources.The class responsible for staging resources to be accessible by workers during job execution.Mapping of old PTransform names to new ones, specified as JSON{"oldName":"newName",...}
.The max elements read from an UnboundedReader before checkpointing.The max amount of time an UnboundedReader is consumed before checkpointing.Deprecated.The max amount of time waiting for elements when reading from UnboundedReader.The size of the worker's in-memory cache, in megabytes.void
setApiRootUrl
(String value) void
setDataflowClient
(Dataflow value) void
setDataflowEndpoint
(String value) void
setDataflowJobFile
(String value) void
setDesiredNumUnboundedSourceSplits
(int value) void
setDumpHeapOnOOM
(boolean dumpHeapBeforeExit) void
setJfrRecordingDurationSec
(int value) void
setNumberOfWorkerHarnessThreads
(int value) void
setReaderCacheTimeoutSec
(Integer value) void
setRecordJfrOnGcThrashing
(boolean value) void
setSaveHeapDumpsToGcsPath
(String gcsPath) void
void
void
setStagerClass
(Class<? extends Stager> stagerClass) void
setTransformNameMapping
(Map<String, String> value) void
void
void
void
void
setWorkerCacheMb
(Integer value) Methods inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
getExperiments, setExperiments
Methods inherited from interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayData
Methods inherited from interface org.apache.beam.sdk.options.MemoryMonitorOptions
getGCThrashingPercentagePerPeriod, getGzipCompressHeapDumps, getRemoteHeapDumpLocation, setGCThrashingPercentagePerPeriod, setGzipCompressHeapDumps, setRemoteHeapDumpLocation
Methods inherited from interface org.apache.beam.sdk.options.PipelineOptions
as, getJobName, getOptionsId, getRunner, getStableUniqueNames, getTempLocation, getUserAgent, outputRuntimeOptions, revision, setJobName, setOptionsId, setRunner, setStableUniqueNames, setTempLocation, setUserAgent
-
Method Details
-
getApiRootUrl
The root URL for the Dataflow API.dataflowEndpoint
can override this value if it contains an absolute URL, otherwiseapiRootUrl
will be combined withdataflowEndpoint
to generate the full URL to communicate with the Dataflow API. -
setApiRootUrl
-
getDataflowEndpoint
Dataflow endpoint to use.Defaults to the current version of the Google Cloud Dataflow API, at the time the current SDK version was released.
If the string contains "://", then this is treated as a URL, otherwise
getApiRootUrl()
is used as the root URL. -
setDataflowEndpoint
-
getDataflowJobFile
String getDataflowJobFile()The path to write the translated Dataflow job specification out to at job submission time. The Dataflow job specification will be represented in JSON format. -
setDataflowJobFile
-
getStagerClass
The class responsible for staging resources to be accessible by workers during job execution. If stager has not been set explicitly, an instance of this class will be created and used as the resource stager. -
setStagerClass
-
getStager
The resource stager instance that should be used to stage resources. If no stager has been set explicitly, the default is to use the instance factory that constructs a resource stager based upon the currently set stagerClass. -
setStager
-
getDataflowClient
An instance of the Dataflow client. Defaults to creating a Dataflow client using the current set of options. -
setDataflowClient
-
getTransformNameMapping
Mapping of old PTransform names to new ones, specified as JSON{"oldName":"newName",...}
. To mark a transform as deleted, make newName the empty string. -
setTransformNameMapping
-
getNumberOfWorkerHarnessThreads
int getNumberOfWorkerHarnessThreads()Number of threads to use on the Dataflow worker harness. If left unspecified, the Dataflow service will compute an appropriate number of threads to use. -
setNumberOfWorkerHarnessThreads
void setNumberOfWorkerHarnessThreads(int value) -
getDumpHeapOnOOM
boolean getDumpHeapOnOOM()If true, save a heap dump before killing a thread or process which is GC thrashing or out of memory. The location of the heap file will either be echoed back to the user, or the user will be given the opportunity to download the heap file.CAUTION: Heap dumps can of comparable size to the default boot disk. Consider increasing the boot disk size before setting this flag to true.
-
setDumpHeapOnOOM
void setDumpHeapOnOOM(boolean dumpHeapBeforeExit) -
getRecordJfrOnGcThrashing
boolean getRecordJfrOnGcThrashing()If true, save a JFR profile when GC thrashing is first detected. The profile will run for the amount of time set by --jfrRecordingDurationSec, or 60 seconds by default.Note, JFR profiles are only supported on java 9 and up.
-
setRecordJfrOnGcThrashing
void setRecordJfrOnGcThrashing(boolean value) -
getJfrRecordingDurationSec
-
setJfrRecordingDurationSec
void setJfrRecordingDurationSec(int value) -
getWorkerCacheMb
The size of the worker's in-memory cache, in megabytes.Currently, this cache is used for storing read values of side inputs in batch as well as the user state for streaming jobs.
-
setWorkerCacheMb
-
getReaderCacheTimeoutSec
The amount of time before UnboundedReaders are considered idle and closed during streaming execution. -
setReaderCacheTimeoutSec
-
getUnboundedReaderMaxReadTimeSec
Deprecated.usegetUnboundedReaderMaxReadTimeMs()
insteadThe max amount of time an UnboundedReader is consumed before checkpointing. -
setUnboundedReaderMaxReadTimeSec
-
getUnboundedReaderMaxReadTimeMs
The max amount of time an UnboundedReader is consumed before checkpointing. -
setUnboundedReaderMaxReadTimeMs
-
getUnboundedReaderMaxElements
The max elements read from an UnboundedReader before checkpointing. -
setUnboundedReaderMaxElements
-
getUnboundedReaderMaxWaitForElementsMs
The max amount of time waiting for elements when reading from UnboundedReader. -
setUnboundedReaderMaxWaitForElementsMs
-
getDesiredNumUnboundedSourceSplits
The desired number of initial splits for UnboundedSources. If this value is invalid input: '<'=0, the splits will be computed based on the number of user workers. -
setDesiredNumUnboundedSourceSplits
void setDesiredNumUnboundedSourceSplits(int value) -
getSaveHeapDumpsToGcsPath
String getSaveHeapDumpsToGcsPath()CAUTION: This option implies dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of comparable size to the default boot disk. Consider increasing the boot disk size before setting this flag to true. -
setSaveHeapDumpsToGcsPath
-
getSdkHarnessContainerImageOverrides
String getSdkHarnessContainerImageOverrides()Overrides for SDK harness container images. -
setSdkHarnessContainerImageOverrides
-
getUnboundedReaderMaxReadTimeMs()
instead