SparkInputDataProcessor (Apache Beam 2.67.0)

public interface SparkInputDataProcessor<FnInputT,FnOutputT,OutputT>

Processes Spark's input data iterators using Beam's DoFnRunner.

Method Summary

Modifier and Type

Method

Description

static <FnInputT, FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT,scala.Tuple2<TupleTag<?>,WindowedValue<?>>>

createBounded()

Creates SparkInputDataProcessor which does process input elements in separate thread and observes produced outputs via bounded queue in other thread.

<K> Iterator<OutputT>

createOutputIterator(Iterator<WindowedValue<FnInputT>> input, SparkProcessContext<K,FnInputT,FnOutputT> ctx)

Creates a transformation which processes input partition data and returns output results as Iterator.

static <FnInputT, FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT,scala.Tuple2<TupleTag<?>,WindowedValue<?>>>

createUnbounded()

Creates SparkInputDataProcessor which does processing in calling thread.

org.apache.beam.sdk.util.WindowedValueMultiReceiver

getOutputManager()

Method Details
- getOutputManager
  
  org.apache.beam.sdk.util.WindowedValueMultiReceiver getOutputManager()
  
  Returns:
  
  WindowedValueMultiReceiver to be used by DoFnRunner for emitting processing results
- createOutputIterator
  
  <K> Iterator<OutputT> createOutputIterator(Iterator<WindowedValue<FnInputT>> input, SparkProcessContext<K,FnInputT,FnOutputT> ctx)
  
  Creates a transformation which processes input partition data and returns output results as Iterator.
  
  Parameters:
  
  input - input partition iterator
  
  ctx - current processing context
- createUnbounded
  
  static <FnInputT, FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT,scala.Tuple2<TupleTag<?>,WindowedValue<?>>> createUnbounded()
  
  Creates SparkInputDataProcessor which does processing in calling thread. It is doing so by processing input element completely and then iterating over the output retrieved from that processing. The result of processing one element must fit into memory.
- createBounded
  
  static <FnInputT, FnOutputT> SparkInputDataProcessor<FnInputT,FnOutputT,scala.Tuple2<TupleTag<?>,WindowedValue<?>>> createBounded()
  
  Creates SparkInputDataProcessor which does process input elements in separate thread and observes produced outputs via bounded queue in other thread. This does not require results of processing one element to fit into the memory.