Interface DataflowPipelineDebugOptions
- All Superinterfaces:
ExperimentalOptions,HasDisplayData,MemoryMonitorOptions,PipelineOptions
- All Known Subinterfaces:
DataflowPipelineOptions,DataflowWorkerHarnessOptions,TestDataflowPipelineOptions
@Hidden
public interface DataflowPipelineDebugOptions
extends ExperimentalOptions, MemoryMonitorOptions, PipelineOptions
Internal. Options used to control execution of the Dataflow SDK for debugging and testing
purposes.
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic classReturns the default Dataflow client built from the passed in PipelineOptions.static classCreates aStagerobject using the class specified ingetStagerClass().static final classSets Integer value based on old, deprecated field (getUnboundedReaderMaxReadTimeSec()).Nested classes/interfaces inherited from interface org.apache.beam.sdk.options.PipelineOptions
PipelineOptions.AtomicLongFactory, PipelineOptions.CheckEnabled, PipelineOptions.DirectRunner, PipelineOptions.JobNameFactory, PipelineOptions.UserAgentFactory -
Field Summary
Fields inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
STATE_CACHE_SIZE, STATE_SAMPLING_PERIOD_MILLIS -
Method Summary
Modifier and TypeMethodDescriptionThe root URL for the Dataflow API.An instance of the Dataflow client.Dataflow endpoint to use.The path to write the translated Dataflow job specification out to at job submission time.intThe desired number of initial splits for UnboundedSources.booleanIf true, save a heap dump before killing a thread or process which is GC thrashing or out of memory.intintNumber of threads to use on the Dataflow worker harness.The amount of time before UnboundedReaders are considered idle and closed during streaming execution.booleanIf true, save a JFR profile when GC thrashing is first detected.CAUTION: This option implies dumpHeapOnOOM, and has similar caveats.Overrides for SDK harness container images.The resource stager instance that should be used to stage resources.The class responsible for staging resources to be accessible by workers during job execution.Mapping of old PTransform names to new ones, specified as JSON{"oldName":"newName",...}.The max elements read from an UnboundedReader before checkpointing.The max amount of time an UnboundedReader is consumed before checkpointing.Deprecated.The max amount of time waiting for elements when reading from UnboundedReader.The size of the worker's in-memory cache, in megabytes.voidsetApiRootUrl(String value) voidsetDataflowClient(Dataflow value) voidsetDataflowEndpoint(String value) voidsetDataflowJobFile(String value) voidsetDesiredNumUnboundedSourceSplits(int value) voidsetDumpHeapOnOOM(boolean dumpHeapBeforeExit) voidsetJfrRecordingDurationSec(int value) voidsetNumberOfWorkerHarnessThreads(int value) voidsetReaderCacheTimeoutSec(Integer value) voidsetRecordJfrOnGcThrashing(boolean value) voidsetSaveHeapDumpsToGcsPath(String gcsPath) voidvoidvoidsetStagerClass(Class<? extends Stager> stagerClass) voidsetTransformNameMapping(Map<String, String> value) voidvoidvoidvoidvoidsetWorkerCacheMb(Integer value) Methods inherited from interface org.apache.beam.sdk.options.ExperimentalOptions
getExperiments, setExperimentsMethods inherited from interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayDataMethods inherited from interface org.apache.beam.sdk.options.MemoryMonitorOptions
getGCThrashingPercentagePerPeriod, getGzipCompressHeapDumps, getRemoteHeapDumpLocation, setGCThrashingPercentagePerPeriod, setGzipCompressHeapDumps, setRemoteHeapDumpLocationMethods inherited from interface org.apache.beam.sdk.options.PipelineOptions
as, getGbek, getJobName, getOptionsId, getRunner, getStableUniqueNames, getTempLocation, getUserAgent, outputRuntimeOptions, revision, setGbek, setJobName, setOptionsId, setRunner, setStableUniqueNames, setTempLocation, setUserAgent
-
Method Details
-
getApiRootUrl
The root URL for the Dataflow API.dataflowEndpointcan override this value if it contains an absolute URL, otherwiseapiRootUrlwill be combined withdataflowEndpointto generate the full URL to communicate with the Dataflow API. -
setApiRootUrl
-
getDataflowEndpoint
Dataflow endpoint to use.Defaults to the current version of the Google Cloud Dataflow API, at the time the current SDK version was released.
If the string contains "://", then this is treated as a URL, otherwise
getApiRootUrl()is used as the root URL. -
setDataflowEndpoint
-
getDataflowJobFile
String getDataflowJobFile()The path to write the translated Dataflow job specification out to at job submission time. The Dataflow job specification will be represented in JSON format. -
setDataflowJobFile
-
getStagerClass
The class responsible for staging resources to be accessible by workers during job execution. If stager has not been set explicitly, an instance of this class will be created and used as the resource stager. -
setStagerClass
-
getStager
The resource stager instance that should be used to stage resources. If no stager has been set explicitly, the default is to use the instance factory that constructs a resource stager based upon the currently set stagerClass. -
setStager
-
getDataflowClient
An instance of the Dataflow client. Defaults to creating a Dataflow client using the current set of options. -
setDataflowClient
-
getTransformNameMapping
Mapping of old PTransform names to new ones, specified as JSON{"oldName":"newName",...}. To mark a transform as deleted, make newName the empty string. -
setTransformNameMapping
-
getNumberOfWorkerHarnessThreads
int getNumberOfWorkerHarnessThreads()Number of threads to use on the Dataflow worker harness. If left unspecified, the Dataflow service will compute an appropriate number of threads to use. -
setNumberOfWorkerHarnessThreads
void setNumberOfWorkerHarnessThreads(int value) -
getDumpHeapOnOOM
boolean getDumpHeapOnOOM()If true, save a heap dump before killing a thread or process which is GC thrashing or out of memory. The location of the heap file will either be echoed back to the user, or the user will be given the opportunity to download the heap file.CAUTION: Heap dumps can of comparable size to the default boot disk. Consider increasing the boot disk size before setting this flag to true.
-
setDumpHeapOnOOM
void setDumpHeapOnOOM(boolean dumpHeapBeforeExit) -
getRecordJfrOnGcThrashing
boolean getRecordJfrOnGcThrashing()If true, save a JFR profile when GC thrashing is first detected. The profile will run for the amount of time set by --jfrRecordingDurationSec, or 60 seconds by default.Note, JFR profiles are only supported on java 9 and up.
-
setRecordJfrOnGcThrashing
void setRecordJfrOnGcThrashing(boolean value) -
getJfrRecordingDurationSec
-
setJfrRecordingDurationSec
void setJfrRecordingDurationSec(int value) -
getWorkerCacheMb
The size of the worker's in-memory cache, in megabytes.Currently, this cache is used for storing read values of side inputs in batch as well as the user state for streaming jobs.
-
setWorkerCacheMb
-
getReaderCacheTimeoutSec
The amount of time before UnboundedReaders are considered idle and closed during streaming execution. -
setReaderCacheTimeoutSec
-
getUnboundedReaderMaxReadTimeSec
Deprecated.usegetUnboundedReaderMaxReadTimeMs()insteadThe max amount of time an UnboundedReader is consumed before checkpointing. -
setUnboundedReaderMaxReadTimeSec
-
getUnboundedReaderMaxReadTimeMs
The max amount of time an UnboundedReader is consumed before checkpointing. -
setUnboundedReaderMaxReadTimeMs
-
getUnboundedReaderMaxElements
The max elements read from an UnboundedReader before checkpointing. -
setUnboundedReaderMaxElements
-
getUnboundedReaderMaxWaitForElementsMs
The max amount of time waiting for elements when reading from UnboundedReader. -
setUnboundedReaderMaxWaitForElementsMs
-
getDesiredNumUnboundedSourceSplits
The desired number of initial splits for UnboundedSources. If this value is invalid input: '<'=0, the splits will be computed based on the number of user workers. -
setDesiredNumUnboundedSourceSplits
void setDesiredNumUnboundedSourceSplits(int value) -
getSaveHeapDumpsToGcsPath
String getSaveHeapDumpsToGcsPath()CAUTION: This option implies dumpHeapOnOOM, and has similar caveats. Specifically, heap dumps can of comparable size to the default boot disk. Consider increasing the boot disk size before setting this flag to true. -
setSaveHeapDumpsToGcsPath
-
getSdkHarnessContainerImageOverrides
String getSdkHarnessContainerImageOverrides()Overrides for SDK harness container images. -
setSdkHarnessContainerImageOverrides
-
getUnboundedReaderMaxReadTimeMs()instead