Class StateSpecFunctions
java.lang.Object
org.apache.beam.runners.spark.stateful.StateSpecFunctions
A class containing
StateSpec mappingFunctions.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic <T,CheckpointMarkT extends UnboundedSource.CheckpointMark>
scala.Function3<Source<T>, scala.Option<CheckpointMarkT>, org.apache.spark.streaming.State<scala.Tuple2<byte[], Instant>>, scala.Tuple2<Iterable<byte[]>, SparkUnboundedSource.Metadata>> mapSourceFunction(org.apache.beam.runners.core.construction.SerializablePipelineOptions options, String stepName) AStateSpecfunction to support reading from anUnboundedSource.
-
Constructor Details
-
StateSpecFunctions
public StateSpecFunctions()
-
-
Method Details
-
mapSourceFunction
public static <T,CheckpointMarkT extends UnboundedSource.CheckpointMark> scala.Function3<Source<T>,scala.Option<CheckpointMarkT>, mapSourceFunctionorg.apache.spark.streaming.State<scala.Tuple2<byte[], Instant>>, scala.Tuple2<Iterable<byte[]>, SparkUnboundedSource.Metadata>> (org.apache.beam.runners.core.construction.SerializablePipelineOptions options, String stepName) AStateSpecfunction to support reading from anUnboundedSource.This StateSpec function expects the following:
- Key: The (partitioned) Source to read from.
- Value: An optional
UnboundedSource.CheckpointMarkto start from. - State: A byte representation of the (previously) persisted CheckpointMark.
This stateful operation could be described as a flatMap over a single-element stream, which outputs all the elements read from the
UnboundedSourcefor this micro-batch. Since micro-batches are bounded, the provided UnboundedSource is wrapped by aMicrobatchSourcethat applies bounds in the form of duration and max records (per micro-batch).In order to avoid using Spark Guava's classes which pollute the classpath, we use the
StateSpec.function(scala.Function3)signature which employs scala's nativeOption, instead of theStateSpec.function(org.apache.spark.api.java.function.Function3)signature, which employs Guava'sOptional.See also SPARK-4819.
- Type Parameters:
T- The type of the input stream elements.CheckpointMarkT- The type of theUnboundedSource.CheckpointMark.- Parameters:
options- A serializableSerializablePipelineOptions.- Returns:
- The appropriate
StateSpecfunction.
-