Interface SparkInputDataProcessor<FnInputT,FnOutputT,OutputT>
public interface SparkInputDataProcessor<FnInputT,FnOutputT,OutputT>
Processes Spark's input data iterators using Beam's
DoFnRunner
.-
Method Summary
Modifier and TypeMethodDescriptionstatic <FnInputT,
FnOutputT>
SparkInputDataProcessor<FnInputT, FnOutputT, scala.Tuple2<TupleTag<?>, WindowedValue<?>>> CreatesSparkInputDataProcessor
which does process input elements in separate thread and observes produced outputs via bounded queue in other thread.createOutputIterator
(Iterator<WindowedValue<FnInputT>> input, SparkProcessContext<K, FnInputT, FnOutputT> ctx) Creates a transformation which processes input partition data and returns output results asIterator
.static <FnInputT,
FnOutputT>
SparkInputDataProcessor<FnInputT, FnOutputT, scala.Tuple2<TupleTag<?>, WindowedValue<?>>> CreatesSparkInputDataProcessor
which does processing in calling thread.org.apache.beam.sdk.util.WindowedValueMultiReceiver
-
Method Details
-
getOutputManager
org.apache.beam.sdk.util.WindowedValueMultiReceiver getOutputManager()- Returns:
WindowedValueMultiReceiver
to be used byDoFnRunner
for emitting processing results
-
createOutputIterator
<K> Iterator<OutputT> createOutputIterator(Iterator<WindowedValue<FnInputT>> input, SparkProcessContext<K, FnInputT, FnOutputT> ctx) Creates a transformation which processes input partition data and returns output results asIterator
.- Parameters:
input
- input partition iteratorctx
- current processing context
-
createUnbounded
static <FnInputT,FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT, createUnbounded()scala.Tuple2<TupleTag<?>, WindowedValue<?>>> CreatesSparkInputDataProcessor
which does processing in calling thread. It is doing so by processing input element completely and then iterating over the output retrieved from that processing. The result of processing one element must fit into memory. -
createBounded
static <FnInputT,FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT, createBounded()scala.Tuple2<TupleTag<?>, WindowedValue<?>>> CreatesSparkInputDataProcessor
which does process input elements in separate thread and observes produced outputs via bounded queue in other thread. This does not require results of processing one element to fit into the memory.
-