Class StateSpecFunctions
java.lang.Object
org.apache.beam.runners.spark.stateful.StateSpecFunctions
A class containing
StateSpec
mappingFunctions.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic <T,
CheckpointMarkT extends UnboundedSource.CheckpointMark>
scala.Function3<Source<T>, scala.Option<CheckpointMarkT>, org.apache.spark.streaming.State<scala.Tuple2<byte[], Instant>>, scala.Tuple2<Iterable<byte[]>, SparkUnboundedSource.Metadata>> mapSourceFunction
(org.apache.beam.runners.core.construction.SerializablePipelineOptions options, String stepName) AStateSpec
function to support reading from anUnboundedSource
.
-
Constructor Details
-
StateSpecFunctions
public StateSpecFunctions()
-
-
Method Details
-
mapSourceFunction
public static <T,CheckpointMarkT extends UnboundedSource.CheckpointMark> scala.Function3<Source<T>,scala.Option<CheckpointMarkT>, mapSourceFunctionorg.apache.spark.streaming.State<scala.Tuple2<byte[], Instant>>, scala.Tuple2<Iterable<byte[]>, SparkUnboundedSource.Metadata>> (org.apache.beam.runners.core.construction.SerializablePipelineOptions options, String stepName) AStateSpec
function to support reading from anUnboundedSource
.This StateSpec function expects the following:
- Key: The (partitioned) Source to read from.
- Value: An optional
UnboundedSource.CheckpointMark
to start from. - State: A byte representation of the (previously) persisted CheckpointMark.
This stateful operation could be described as a flatMap over a single-element stream, which outputs all the elements read from the
UnboundedSource
for this micro-batch. Since micro-batches are bounded, the provided UnboundedSource is wrapped by aMicrobatchSource
that applies bounds in the form of duration and max records (per micro-batch).In order to avoid using Spark Guava's classes which pollute the classpath, we use the
StateSpec.function(scala.Function3)
signature which employs scala's nativeOption
, instead of theStateSpec.function(org.apache.spark.api.java.function.Function3)
signature, which employs Guava'sOptional
.See also SPARK-4819.
- Type Parameters:
T
- The type of the input stream elements.CheckpointMarkT
- The type of theUnboundedSource.CheckpointMark
.- Parameters:
options
- A serializableSerializablePipelineOptions
.- Returns:
- The appropriate
StateSpec
function.
-