Interface SparkInputDataProcessor<FnInputT,FnOutputT,OutputT>
public interface SparkInputDataProcessor<FnInputT,FnOutputT,OutputT>
Processes Spark's input data iterators using Beam's
DoFnRunner.-
Method Summary
Modifier and TypeMethodDescriptionstatic <FnInputT,FnOutputT>
SparkInputDataProcessor<FnInputT, FnOutputT, scala.Tuple2<TupleTag<?>, WindowedValue<?>>> CreatesSparkInputDataProcessorwhich does process input elements in separate thread and observes produced outputs via bounded queue in other thread.createOutputIterator(Iterator<WindowedValue<FnInputT>> input, SparkProcessContext<K, FnInputT, FnOutputT> ctx) Creates a transformation which processes input partition data and returns output results asIterator.static <FnInputT,FnOutputT>
SparkInputDataProcessor<FnInputT, FnOutputT, scala.Tuple2<TupleTag<?>, WindowedValue<?>>> CreatesSparkInputDataProcessorwhich does processing in calling thread.org.apache.beam.sdk.util.WindowedValueMultiReceiver
-
Method Details
-
getOutputManager
org.apache.beam.sdk.util.WindowedValueMultiReceiver getOutputManager()- Returns:
WindowedValueMultiReceiverto be used byDoFnRunnerfor emitting processing results
-
createOutputIterator
<K> Iterator<OutputT> createOutputIterator(Iterator<WindowedValue<FnInputT>> input, SparkProcessContext<K, FnInputT, FnOutputT> ctx) Creates a transformation which processes input partition data and returns output results asIterator.- Parameters:
input- input partition iteratorctx- current processing context
-
createUnbounded
static <FnInputT,FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT, createUnbounded()scala.Tuple2<TupleTag<?>, WindowedValue<?>>> CreatesSparkInputDataProcessorwhich does processing in calling thread. It is doing so by processing input element completely and then iterating over the output retrieved from that processing. The result of processing one element must fit into memory. -
createBounded
static <FnInputT,FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT, createBounded()scala.Tuple2<TupleTag<?>, WindowedValue<?>>> CreatesSparkInputDataProcessorwhich does process input elements in separate thread and observes produced outputs via bounded queue in other thread. This does not require results of processing one element to fit into the memory.
-