public class WatermarkManager<ExecutableT,CollectionT>
extends java.lang.Object
PCollections
and input and output watermarks of AppliedPTransforms
to provide event-time and completion tracking for in-memory
execution. WatermarkManager
is designed to update and return a consistent view of
watermarks in the presence of concurrent updates.
An WatermarkManager
is provided with the collection of root AppliedPTransforms
and a map of PCollections
to all the AppliedPTransforms
that consume them at construction time.
Whenever a root executable
produces elements, the WatermarkManager
is provided with the produced elements and the output watermark of the
producing executable
. The watermark manager
is
responsible for computing the watermarks of all transforms
that consume
one or more PCollections
.
Whenever a non-root AppliedPTransform
finishes processing one or more in-flight
elements (referred to as the input bundle
), the following occurs
atomically:
AppliedPTransform
.
AppliedPTransform
are added to the collection
of pending elements for each AppliedPTransform
that consumes them.
AppliedPTransform
becomes the maximum value of
PCollection
watermarks
AppliedPTransform
becomes the maximum of
PCollection
can be advanced to the output watermark of
the AppliedPTransform
AppliedPTransforms
can be
advanced.
The watermark of a PCollection
is equal to the output watermark of the AppliedPTransform
that produces it.
The watermarks for a PTransform
are updated as follows when output is committed:
Watermark_In' = MAX(Watermark_In, MIN(U(TS_Pending), U(Watermark_InputPCollection))) Watermark_Out' = MAX(Watermark_Out, MIN(Watermark_In', U(StateHold))) Watermark_PCollection = Watermark_Out_ProducingPTransform
Modifier and Type | Class and Description |
---|---|
static class |
WatermarkManager.FiredTimers<ExecutableT>
A pair of
TimerInternals.TimerData and key which can be delivered to the appropriate AppliedPTransform . |
static class |
WatermarkManager.TimerUpdate
A collection of newly set, deleted, and completed timers.
|
class |
WatermarkManager.TransformWatermarks
A reference to the input and output watermarks of an
AppliedPTransform . |
Modifier and Type | Method and Description |
---|---|
static <ExecutableT,CollectionT> |
create(Clock clock,
ExecutableGraph<ExecutableT,? super CollectionT> graph,
java.util.function.Function<ExecutableT,java.lang.String> getName)
Creates a new
WatermarkManager . |
java.util.Collection<WatermarkManager.FiredTimers<ExecutableT>> |
extractFiredTimers(java.util.Collection<ExecutableT> ignoredExecutables)
Returns a map of each
PTransform that has pending timers to those timers. |
WatermarkManager.TransformWatermarks |
getWatermarks(ExecutableT executable)
Gets the input and output watermarks for an
AppliedPTransform . |
void |
initialize(java.util.Map<ExecutableT,? extends java.lang.Iterable<Bundle<?,CollectionT>>> initialBundles) |
void |
refreshAll()
Refresh the watermarks contained within this
WatermarkManager , causing all watermarks
to be advanced as far as possible. |
void |
updateWatermarks(@Nullable Bundle<?,? extends CollectionT> completed,
WatermarkManager.TimerUpdate timerUpdate,
ExecutableT executable,
@Nullable Bundle<?,? extends CollectionT> unprocessedInputs,
java.lang.Iterable<? extends Bundle<?,? extends CollectionT>> outputs,
Instant earliestHold)
Updates the watermarks of a executable with one or more inputs.
|
public static <ExecutableT,CollectionT> WatermarkManager<ExecutableT,? super CollectionT> create(Clock clock, ExecutableGraph<ExecutableT,? super CollectionT> graph, java.util.function.Function<ExecutableT,java.lang.String> getName)
WatermarkManager
. All watermarks within the newly created WatermarkManager
start at BoundedWindow.TIMESTAMP_MIN_VALUE
, the minimum watermark,
with no watermark holds or pending elements.clock
- the clock to use to determine processing timegraph
- the graph representing this pipelinegetName
- a function for producing a short identifier for the executable in watermark
tracing log messages.public WatermarkManager.TransformWatermarks getWatermarks(ExecutableT executable)
AppliedPTransform
. If the PTransform
has not processed any elements, return a watermark of BoundedWindow.TIMESTAMP_MIN_VALUE
.public void initialize(java.util.Map<ExecutableT,? extends java.lang.Iterable<Bundle<?,CollectionT>>> initialBundles)
public void updateWatermarks(@Nullable Bundle<?,? extends CollectionT> completed, WatermarkManager.TimerUpdate timerUpdate, ExecutableT executable, @Nullable Bundle<?,? extends CollectionT> unprocessedInputs, java.lang.Iterable<? extends Bundle<?,? extends CollectionT>> outputs, Instant earliestHold)
Each executable has two monotonically increasing watermarks: the input watermark, which can, at any time, be updated to equal:
MAX(CurrentInputWatermark, MIN(PendingElements, InputPCollectionWatermarks))and the output watermark, which can, at any time, be updated to equal:
MAX(CurrentOutputWatermark, MIN(InputWatermark, WatermarkHolds)).
Updates to watermarks may not be immediately visible.
completed
- the input that has completedtimerUpdate
- the timers that were added, removed, and completed as part of producing this
updateexecutable
- the executable applied to completed
to produce the outputsunprocessedInputs
- inputs that could not be processedoutputs
- outputs that were produced by the application of the executable
to the
inputearliestHold
- the earliest watermark hold in the executable's state.public void refreshAll()
WatermarkManager
, causing all watermarks
to be advanced as far as possible.public java.util.Collection<WatermarkManager.FiredTimers<ExecutableT>> extractFiredTimers(java.util.Collection<ExecutableT> ignoredExecutables)
PTransform
that has pending timers to those timers. All of the
pending timers will be removed from this WatermarkManager
.