Interface SparkInputDataProcessor<FnInputT,FnOutputT,OutputT>


public interface SparkInputDataProcessor<FnInputT,FnOutputT,OutputT>
Processes Spark's input data iterators using Beam's DoFnRunner.
  • Method Details

    • getOutputManager

      org.apache.beam.sdk.util.WindowedValueMultiReceiver getOutputManager()
      Returns:
      WindowedValueMultiReceiver to be used by DoFnRunner for emitting processing results
    • createOutputIterator

      <K> Iterator<OutputT> createOutputIterator(Iterator<WindowedValue<FnInputT>> input, SparkProcessContext<K,FnInputT,FnOutputT> ctx)
      Creates a transformation which processes input partition data and returns output results as Iterator.
      Parameters:
      input - input partition iterator
      ctx - current processing context
    • createUnbounded

      static <FnInputT, FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT,scala.Tuple2<TupleTag<?>,WindowedValue<?>>> createUnbounded()
      Creates SparkInputDataProcessor which does processing in calling thread. It is doing so by processing input element completely and then iterating over the output retrieved from that processing. The result of processing one element must fit into memory.
    • createBounded

      static <FnInputT, FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT,scala.Tuple2<TupleTag<?>,WindowedValue<?>>> createBounded()
      Creates SparkInputDataProcessor which does process input elements in separate thread and observes produced outputs via bounded queue in other thread. This does not require results of processing one element to fit into the memory.