@Internal public class WatermarkManager<ExecutableT,CollectionT> extends java.lang.Object
PCollectionsand input and output watermarks of
AppliedPTransformsto provide event-time and completion tracking for in-memory execution.
WatermarkManageris designed to update and return a consistent view of watermarks in the presence of concurrent updates.
Whenever a root
executable produces elements, the
WatermarkManager is provided with the produced elements and the output watermark of the
watermark manager is
responsible for computing the watermarks of all
transforms that consume
one or more
Whenever a non-root
AppliedPTransform finishes processing one or more in-flight
elements (referred to as the input
bundle), the following occurs
AppliedPTransformare added to the collection of pending elements for each
AppliedPTransformthat consumes them.
AppliedPTransformbecomes the maximum value of
AppliedPTransformbecomes the maximum of
PCollectioncan be advanced to the output watermark of the
AppliedPTransformscan be advanced.
The watermark of a
PCollection is equal to the output watermark of the
AppliedPTransform that produces it.
The watermarks for a
PTransform are updated as follows when output is committed:
Watermark_In' = MAX(Watermark_In, MIN(U(TS_Pending), U(Watermark_InputPCollection))) Watermark_Out' = MAX(Watermark_Out, MIN(Watermark_In', U(StateHold))) Watermark_PCollection = Watermark_Out_ProducingPTransform
|Modifier and Type||Class and Description|
A pair of
A collection of newly set, deleted, and completed timers.
A reference to the input and output watermarks of an
|Modifier and Type||Method and Description|
Creates a new
Returns a map of each
Gets the input and output watermarks for an
Refresh the watermarks contained within this
Updates the watermarks of a executable with one or more inputs.
public static <ExecutableT,CollectionT> WatermarkManager<ExecutableT,? super CollectionT> create(Clock clock, ExecutableGraph<ExecutableT,? super CollectionT> graph, java.util.function.Function<ExecutableT,java.lang.String> getName)
WatermarkManager. All watermarks within the newly created
BoundedWindow.TIMESTAMP_MIN_VALUE, the minimum watermark, with no watermark holds or pending elements.
clock- the clock to use to determine processing time
graph- the graph representing this pipeline
getName- a function for producing a short identifier for the executable in watermark tracing log messages.
public WatermarkManager.TransformWatermarks getWatermarks(ExecutableT executable)
AppliedPTransform. If the
PTransformhas not processed any elements, return a watermark of
public void initialize(java.util.Map<ExecutableT,? extends java.lang.Iterable<Bundle<?,CollectionT>>> initialBundles)
public void updateWatermarks(@Nullable Bundle<?,? extends CollectionT> completed, WatermarkManager.TimerUpdate timerUpdate, ExecutableT executable, @Nullable Bundle<?,? extends CollectionT> unprocessedInputs, java.lang.Iterable<? extends Bundle<?,? extends CollectionT>> outputs, Instant earliestHold)
Each executable has two monotonically increasing watermarks: the input watermark, which can, at any time, be updated to equal:
MAX(CurrentInputWatermark, MIN(PendingElements, InputPCollectionWatermarks))and the output watermark, which can, at any time, be updated to equal:
MAX(CurrentOutputWatermark, MIN(InputWatermark, WatermarkHolds)).
Updates to watermarks may not be immediately visible.
completed- the input that has completed
timerUpdate- the timers that were added, removed, and completed as part of producing this update
executable- the executable applied to
completedto produce the outputs
unprocessedInputs- inputs that could not be processed
outputs- outputs that were produced by the application of the
executableto the input
earliestHold- the earliest watermark hold in the executable's state.
public void refreshAll()
WatermarkManager, causing all watermarks to be advanced as far as possible.