Class StateSpecFunctions

java.lang.Object
org.apache.beam.runners.spark.stateful.StateSpecFunctions

public class StateSpecFunctions extends Object
A class containing StateSpec mappingFunctions.
  • Constructor Details

    • StateSpecFunctions

      public StateSpecFunctions()
  • Method Details

    • mapSourceFunction

      public static <T, CheckpointMarkT extends UnboundedSource.CheckpointMark> scala.Function3<Source<T>,scala.Option<CheckpointMarkT>,org.apache.spark.streaming.State<scala.Tuple2<byte[],Instant>>,scala.Tuple2<Iterable<byte[]>,SparkUnboundedSource.Metadata>> mapSourceFunction(org.apache.beam.runners.core.construction.SerializablePipelineOptions options, String stepName)
      A StateSpec function to support reading from an UnboundedSource.

      This StateSpec function expects the following:

      • Key: The (partitioned) Source to read from.
      • Value: An optional UnboundedSource.CheckpointMark to start from.
      • State: A byte representation of the (previously) persisted CheckpointMark.
      And returns an iterator over all read values (for the micro-batch).

      This stateful operation could be described as a flatMap over a single-element stream, which outputs all the elements read from the UnboundedSource for this micro-batch. Since micro-batches are bounded, the provided UnboundedSource is wrapped by a MicrobatchSource that applies bounds in the form of duration and max records (per micro-batch).

      In order to avoid using Spark Guava's classes which pollute the classpath, we use the StateSpec.function(scala.Function3) signature which employs scala's native Option, instead of the StateSpec.function(org.apache.spark.api.java.function.Function3) signature, which employs Guava's Optional.

      See also SPARK-4819.

      Type Parameters:
      T - The type of the input stream elements.
      CheckpointMarkT - The type of the UnboundedSource.CheckpointMark.
      Parameters:
      options - A serializable SerializablePipelineOptions.
      Returns:
      The appropriate StateSpec function.