Package org.apache.beam.runners.spark.io
Class SparkUnboundedSource
java.lang.Object
org.apache.beam.runners.spark.io.SparkUnboundedSource
A "composite" InputDStream implementation for
UnboundedSource
s.
This read is a composite of the following steps:
- Create a single-element (per-partition) stream, that contains the (partitioned)
Source
and an optionalUnboundedSource.CheckpointMark
to start from. - Read from within a stateful operation
JavaPairDStream.mapWithState(StateSpec)
using theStateSpecFunctions.mapSourceFunction(org.apache.beam.runners.core.construction.SerializablePipelineOptions, java.lang.String)
mapping function, which manages the state of the CheckpointMark per partition. - Since the stateful operation is a map operation, the read iterator needs to be flattened, while reporting the properties of the read (such as number of records) to the tracker.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
A metadata holder for an input stream partition. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic <T,
CheckpointMarkT extends UnboundedSource.CheckpointMark>
UnboundedDataset<T> read
(org.apache.spark.streaming.api.java.JavaStreamingContext jssc, org.apache.beam.runners.core.construction.SerializablePipelineOptions rc, UnboundedSource<T, CheckpointMarkT> source, String stepName)
-
Constructor Details
-
SparkUnboundedSource
public SparkUnboundedSource()
-
-
Method Details
-
read
public static <T,CheckpointMarkT extends UnboundedSource.CheckpointMark> UnboundedDataset<T> read(org.apache.spark.streaming.api.java.JavaStreamingContext jssc, org.apache.beam.runners.core.construction.SerializablePipelineOptions rc, UnboundedSource<T, CheckpointMarkT> source, String stepName)
-