public class StateSpecFunctions
extends java.lang.Object
StateSpec
mappingFunctions.Constructor and Description |
---|
StateSpecFunctions() |
Modifier and Type | Method and Description |
---|---|
static <T,CheckpointMarkT extends UnboundedSource.CheckpointMark> |
mapSourceFunction(org.apache.beam.runners.core.construction.SerializablePipelineOptions options,
java.lang.String stepName)
A
StateSpec function to support reading from an UnboundedSource . |
public static <T,CheckpointMarkT extends UnboundedSource.CheckpointMark> scala.Function3<Source<T>,scala.Option<CheckpointMarkT>,org.apache.spark.streaming.State<scala.Tuple2<byte[],Instant>>,scala.Tuple2<java.lang.Iterable<byte[]>,SparkUnboundedSource.Metadata>> mapSourceFunction(org.apache.beam.runners.core.construction.SerializablePipelineOptions options, java.lang.String stepName)
StateSpec
function to support reading from an UnboundedSource
.
This StateSpec function expects the following:
UnboundedSource.CheckpointMark
to start from.
This stateful operation could be described as a flatMap over a single-element stream, which
outputs all the elements read from the UnboundedSource
for this micro-batch. Since
micro-batches are bounded, the provided UnboundedSource is wrapped by a MicrobatchSource
that applies bounds in the form of duration and max records (per
micro-batch).
In order to avoid using Spark Guava's classes which pollute the classpath, we use the StateSpec.function(scala.Function3)
signature which employs scala's native Option
, instead of the StateSpec.function(org.apache.spark.api.java.function.Function3)
signature, which employs
Guava's Optional
.
See also SPARK-4819.
T
- The type of the input stream elements.CheckpointMarkT
- The type of the UnboundedSource.CheckpointMark
.options
- A serializable SerializablePipelineOptions
.StateSpec
function.