Class Window<T>
- All Implemented Interfaces:
Serializable,HasDisplayData
Window logically divides up or groups the elements of a PCollection into finite
windows according to a WindowFn. The output of Window contains the same elements
as input, but they have been logically assigned to windows. The next GroupByKeys, including one within composite
transforms, will group by the combination of keys and windows.
See GroupByKey for more information about how grouping
with windows works.
Windowing
Windowing a PCollection divides the elements into windows based on the associated
event time for each element. This is especially useful for PCollections with
unbounded size, since it allows operating on a sub-group of the elements placed into a related
window. For PCollections with a bounded size (aka. conventional batch mode),
by default, all data is implicitly in a single window, unless Window is applied.
For example, a simple form of windowing divides up the data into fixed-width time intervals,
using FixedWindows. The following example demonstrates how to use Window in a
pipeline that counts the number of occurrences of strings each minute:
PCollection<String> items = ...;
PCollection<String> windowed_items = items.apply(
Window.<String>into(FixedWindows.of(Duration.standardMinutes(1))));
PCollection<KV<String, Long>> windowed_counts = windowed_items.apply(
Count.<String>perElement());
Let (data, timestamp) denote a data element along with its timestamp. Then, if the input to this pipeline consists of {("foo", 15s), ("bar", 30s), ("foo", 45s), ("foo", 1m30s)}, the output will be {(KV("foo", 2), 1m), (KV("bar", 1), 1m), (KV("foo", 1), 2m)}
Several predefined WindowFns are provided:
FixedWindowspartitions the timestamps into fixed-width intervals.SlidingWindowsplaces data into overlapping fixed-width intervals.Sessionsgroups data into sessions where each item in a window is separated from the next by no more than a specified gap.
Additionally, custom WindowFns can be created, by creating new subclasses of WindowFn.
Triggers
triggering(Trigger) allows specifying a trigger to control when (in processing
time) results for the given window can be produced. If unspecified, the default behavior is to
trigger first when the watermark passes the end of the window, and then trigger again every time
there is late arriving data.
Elements are added to the current window pane as they arrive. When the root trigger fires, output is produced based on the elements in the current pane.
Depending on the trigger, this can be used both to output partial results early during the processing of the whole window, and to deal with late arriving in batches.
Continuing the earlier example, if we wanted to emit the values that were available when the watermark passed the end of the window, and then output any late arriving elements once-per (actual hour) hour until we have finished processing the next 24-hours of data. (The use of watermark time to stop processing tends to be more robust if the data source is slow for a few days, etc.)
PCollection<String> items = ...;
PCollection<String> windowed_items = items.apply(
Window.<String>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(
AfterWatermark.pastEndOfWindow()
.withLateFirings(AfterProcessingTime
.pastFirstElementInPane().plusDelayOf(Duration.standardHours(1))))
.withAllowedLateness(Duration.standardDays(1)));
PCollection<KV<String, Long>> windowed_counts = windowed_items.apply(
Count.<String>perElement());
On the other hand, if we wanted to get early results every minute of processing time (for which there were new elements in the given window) we could do the following:
PCollection<String> windowed_items = items.apply(
Window.<String>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(
AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime
.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(1))))
.withAllowedLateness(Duration.ZERO));
After a GroupByKey the trigger is set to a trigger that
will preserve the intent of the upstream trigger. See Trigger.getContinuationTrigger() for
more information.
See Trigger for details on the available triggers.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA PrimitivePTransformthat assigns windows to elements based on aWindowFn.static enumSpecifies the conditions under which a final pane will be created when a window is permanently closed.static enumSpecifies the conditions under which an on-time pane will be created when a window is closed. -
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns a newWindowPTransformthat uses the registered WindowFn and Triggering behavior, and that accumulates elements in a pane after they are triggered.static <T> Window<T> Returns a new builder for aWindowtransform for setting windowing parameters other than the windowing function.Returns a newWindowPTransformthat uses the registered WindowFn and Triggering behavior, and that discards elements in a pane after they are triggered.expand(PCollection<T> input) Override this method to specify how thisPTransformshould be expanded on the givenInputT.protected StringReturns the name to use by default for thisPTransform(not including the names of any enclosingPTransforms).WindowingStrategy<?, ?> getOutputStrategyInternal(WindowingStrategy<?, ?> inputStrategy) Get the output strategy of thisWindow PTransform.static <T> Window<T> voidpopulateDisplayData(DisplayData.Builder builder) Register display data for the given transform or component.static <T> org.apache.beam.sdk.transforms.windowing.Window.Remerge<T> remerge()Creates aWindowPTransformthat does not change assigned windows, but will cause windows to be merged again as part of the nextGroupByKey.triggering(Trigger trigger) Sets a non-default trigger for thisWindowPTransform.withAllowedLateness(Duration allowedLateness) Override the amount of lateness allowed for data elements in the outputPCollectionand downstreamPCollectionsuntil explicitly set again.withAllowedLateness(Duration allowedLateness, Window.ClosingBehavior behavior) Override the amount of lateness allowed for data elements in the pipeline.withOnTimeBehavior(Window.OnTimeBehavior behavior) Override the defaultWindow.OnTimeBehavior, to control whether to output an empty on-time pane.withTimestampCombiner(TimestampCombiner timestampCombiner) Override the defaultTimestampCombiner, to control the output timestamp of values output from aGroupByKeyoperation.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
Window
public Window()
-
-
Method Details
-
into
Creates aWindowPTransformthat uses the givenWindowFnto window the data.The resulting
PTransform's types have been bound, with both the input and output being aPCollection<T>, inferred from the types of the argumentWindowFn. It is ready to be applied, or further properties can be set on it first. -
configure
Returns a new builder for aWindowtransform for setting windowing parameters other than the windowing function. -
getWindowFn
-
triggering
Sets a non-default trigger for thisWindowPTransform. Elements that are assigned to a specific window will be output when the trigger fires.Triggerhas more details on the available triggers.Must also specify allowed lateness using
withAllowedLateness(org.joda.time.Duration)and accumulation mode using eitherdiscardingFiredPanes()oraccumulatingFiredPanes(). -
discardingFiredPanes
Returns a newWindowPTransformthat uses the registered WindowFn and Triggering behavior, and that discards elements in a pane after they are triggered.Does not modify this transform. The resulting
PTransformis sufficiently specified to be applied, but more properties can still be specified. -
accumulatingFiredPanes
Returns a newWindowPTransformthat uses the registered WindowFn and Triggering behavior, and that accumulates elements in a pane after they are triggered.Does not modify this transform. The resulting
PTransformis sufficiently specified to be applied, but more properties can still be specified. -
withAllowedLateness
Override the amount of lateness allowed for data elements in the outputPCollectionand downstreamPCollectionsuntil explicitly set again. Like the other properties on thisWindowoperation, this will be applied at the nextGroupByKey. Any elements that are later than this as decided by the system-maintained watermark will be dropped.This value also determines how long state will be kept around for old windows. Once no elements will be added to a window (because this duration has passed) any state associated with the window will be cleaned up.
Depending on the trigger this may not produce a pane with
PaneInfo.isLast. SeeWindow.ClosingBehavior.FIRE_IF_NON_EMPTYfor more details. -
withTimestampCombiner
Override the defaultTimestampCombiner, to control the output timestamp of values output from aGroupByKeyoperation. -
withAllowedLateness
Override the amount of lateness allowed for data elements in the pipeline. Like the other properties on thisWindowoperation, this will be applied at the nextGroupByKey. Any elements that are later than this as decided by the system-maintained watermark will be dropped.This value also determines how long state will be kept around for old windows. Once no elements will be added to a window (because this duration has passed) any state associated with the window will be cleaned up.
-
withOnTimeBehavior
Override the defaultWindow.OnTimeBehavior, to control whether to output an empty on-time pane. -
getOutputStrategyInternal
Get the output strategy of thisWindow PTransform. For internal use only. -
expand
Description copied from class:PTransformOverride this method to specify how thisPTransformshould be expanded on the givenInputT.NOTE: This method should not be called directly. Instead apply the
PTransformshould be applied to theInputTusing theapplymethod.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expandin classPTransform<PCollection<T>,PCollection<T>>
-
populateDisplayData
Description copied from class:PTransformRegister display data for the given transform or component.populateDisplayData(DisplayData.Builder)is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData). Implementations may callsuper.populateDisplayData(builder)in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayDatain interfaceHasDisplayData- Overrides:
populateDisplayDatain classPTransform<PCollection<T>,PCollection<T>> - Parameters:
builder- The builder to populate with display data.- See Also:
-
getKindString
Description copied from class:PTransformReturns the name to use by default for thisPTransform(not including the names of any enclosingPTransforms).By default, returns the base name of this
PTransform's class.The caller is responsible for ensuring that names of applied
PTransforms are unique, e.g., by adding a uniquifying suffix when needed.- Overrides:
getKindStringin classPTransform<PCollection<T>,PCollection<T>>
-
remerge
public static <T> org.apache.beam.sdk.transforms.windowing.Window.Remerge<T> remerge()Creates aWindowPTransformthat does not change assigned windows, but will cause windows to be merged again as part of the nextGroupByKey.
-