public class SparkGroupAlsoByWindowViaWindowSet
extends java.lang.Object
implements java.io.Serializable
GroupByKeyViaGroupByKeyOnly.GroupAlsoByWindow
logic for grouping by windows and controlling
trigger firings and pane accumulation.
This implementation is a composite of Spark transformations revolving around state management
using Spark's PairDStreamFunctions.updateStateByKey(scala.Function1,
org.apache.spark.Partitioner, boolean, scala.reflect.ClassTag)
to update state with new data and
timers.
Using updateStateByKey allows to scan through the entire state visiting not just the updated state (new values for key) but also check if timers are ready to fire. Since updateStateByKey bounds the types of state and output to be the same, a (state, output) tuple is used, filtering the state (and output if no firing) in the following steps.
Modifier and Type | Class and Description |
---|---|
static class |
SparkGroupAlsoByWindowViaWindowSet.StateAndTimers
State and Timers wrapper.
|
Constructor and Description |
---|
SparkGroupAlsoByWindowViaWindowSet() |
Modifier and Type | Method and Description |
---|---|
static <K,InputT,W extends BoundedWindow> |
groupByKeyAndWindow(org.apache.spark.streaming.api.java.JavaDStream<org.apache.beam.sdk.util.WindowedValue<KV<K,InputT>>> inputDStream,
Coder<K> keyCoder,
Coder<org.apache.beam.sdk.util.WindowedValue<InputT>> wvCoder,
WindowingStrategy<?,W> windowingStrategy,
org.apache.beam.runners.core.construction.SerializablePipelineOptions options,
java.util.List<java.lang.Integer> sourceIds,
java.lang.String transformFullName) |
public static <K,InputT,W extends BoundedWindow> org.apache.spark.streaming.api.java.JavaDStream<org.apache.beam.sdk.util.WindowedValue<KV<K,java.lang.Iterable<InputT>>>> groupByKeyAndWindow(org.apache.spark.streaming.api.java.JavaDStream<org.apache.beam.sdk.util.WindowedValue<KV<K,InputT>>> inputDStream, Coder<K> keyCoder, Coder<org.apache.beam.sdk.util.WindowedValue<InputT>> wvCoder, WindowingStrategy<?,W> windowingStrategy, org.apache.beam.runners.core.construction.SerializablePipelineOptions options, java.util.List<java.lang.Integer> sourceIds, java.lang.String transformFullName)