Class TranslationUtils

java.lang.Object
org.apache.beam.runners.spark.translation.TranslationUtils

public final class TranslationUtils extends Object
A set of utilities to help translating Beam transformations into Spark transformations.
  • Method Details

    • skipAssignWindows

      public static <T, W extends BoundedWindow> boolean skipAssignWindows(Window.Assign<T> transform, EvaluationContext context)
      Checks if the window transformation should be applied or skipped.

      Avoid running assign windows if both source and destination are global window or if the user has not specified the WindowFn (meaning they are just messing with triggering or allowed lateness).

      Type Parameters:
      T - PCollection type.
      W - BoundedWindow type.
      Parameters:
      transform - The Window.Assign transformation.
      context - The EvaluationContext.
      Returns:
      if to apply the transformation.
    • dStreamValues

      public static <T1, T2> org.apache.spark.streaming.api.java.JavaDStream<T2> dStreamValues(org.apache.spark.streaming.api.java.JavaPairDStream<T1,T2> pairDStream)
      Transform a pair stream into a value stream.
    • toPairFunction

      public static <K, V> org.apache.spark.api.java.function.PairFunction<KV<K,V>,K,V> toPairFunction()
      KV to pair function.
    • toPairFlatMapFunction

      public static <K, V> org.apache.spark.api.java.function.PairFlatMapFunction<Iterator<KV<K,V>>,K,V> toPairFlatMapFunction()
      KV to pair flatmap function.
    • toPairByKeyInWindowedValue

      public static <K, V> org.apache.spark.api.java.function.PairFunction<WindowedValue<KV<K,V>>,ByteArray,WindowedValue<KV<K,V>>> toPairByKeyInWindowedValue(Coder<K> keyCoder)
      Extract key from a WindowedValue KV into a pair.
    • getSideInputs

      public static Map<TupleTag<?>,KV<WindowingStrategy<?,?>,SideInputBroadcast<?>>> getSideInputs(Iterable<PCollectionView<?>> views, org.apache.spark.api.java.JavaSparkContext context, SparkPCollectionView pviews)
      Create SideInputs as Broadcast variables.
      Parameters:
      views - The PCollectionViews.
      context - The JavaSparkContext.
      pviews - The SparkPCollectionView.
      Returns:
      a map of tagged SideInputBroadcasts and their WindowingStrategy.
    • getBatchDuration

      public static Long getBatchDuration(org.apache.beam.runners.core.construction.SerializablePipelineOptions options)
      Retrieves the batch duration in milliseconds from Spark pipeline options.
      Parameters:
      options - The serializable pipeline options containing Spark-specific settings
      Returns:
      The checkpoint duration in milliseconds as specified in SparkPipelineOptions
    • hasTimers

      public static boolean hasTimers(DoFn<?,?> doFn)
      Checks if the given DoFn uses any timers.
      Parameters:
      doFn - the DoFn to check for timer usage
      Returns:
      true if the DoFn uses timers, false otherwise
    • hasEventTimers

      public static boolean hasEventTimers(DoFn<?,?> doFn)
      Checks if the given DoFn uses event time timers.
      Parameters:
      doFn - the DoFn to check for event time timer usage
      Returns:
      true if the DoFn uses event time timers, false otherwise. Note: Returns false if the DoFn has no timers at all.
    • checkpointIfNeeded

      public static void checkpointIfNeeded(org.apache.spark.streaming.dstream.DStream<?> dStream, org.apache.beam.runners.core.construction.SerializablePipelineOptions options)
      Checkpoints the given DStream if checkpointing is enabled in the pipeline options.
      Parameters:
      dStream - The DStream to be checkpointed
      options - The SerializablePipelineOptions containing configuration settings including batch duration
    • rejectStateAndTimers

      public static void rejectStateAndTimers(DoFn<?,?> doFn)
      Reject state and timers DoFn.
      Parameters:
      doFn - the DoFn to possibly reject.
    • emptyVoidFunction

      public static <T> org.apache.spark.api.java.function.VoidFunction<T> emptyVoidFunction()
    • pairFunctionToPairFlatMapFunction

      public static <T, K, V> org.apache.spark.api.java.function.PairFlatMapFunction<Iterator<T>,K,V> pairFunctionToPairFlatMapFunction(org.apache.spark.api.java.function.PairFunction<T,K,V> pairFunction)
      A utility method that adapts PairFunction to a PairFlatMapFunction with an Iterator input. This is particularly useful because it allows to use functions written for mapToPair functions in flatmapToPair functions.
      Type Parameters:
      T - the input type.
      K - the output key type.
      V - the output value type.
      Parameters:
      pairFunction - the PairFunction to adapt.
      Returns:
      a PairFlatMapFunction that accepts an Iterator as an input and applies the PairFunction on every element.
    • functionToFlatMapFunction

      public static <InputT, OutputT> org.apache.spark.api.java.function.FlatMapFunction<Iterator<InputT>,OutputT> functionToFlatMapFunction(org.apache.spark.api.java.function.Function<InputT,OutputT> func)
      A utility method that adapts Function to a FlatMapFunction with an Iterator input. This is particularly useful because it allows to use functions written for map functions in flatmap functions.
      Type Parameters:
      InputT - the input type.
      OutputT - the output type.
      Parameters:
      func - the Function to adapt.
      Returns:
      a FlatMapFunction that accepts an Iterator as an input and applies the Function on every element.
    • getTupleTagCoders

      public static Map<TupleTag<?>,Coder<WindowedValue<?>>> getTupleTagCoders(Map<TupleTag<?>,PCollection<?>> outputs)
      Utility to get mapping between TupleTag and a coder.
      Parameters:
      outputs - - A map of tuple tags and pcollections
      Returns:
      mapping between TupleTag and a coder
    • getTupleTagEncodeFunction

      public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<TupleTag<?>,WindowedValue<?>>,TupleTag<?>,ValueAndCoderLazySerializable<WindowedValue<?>>> getTupleTagEncodeFunction(Map<TupleTag<?>,Coder<WindowedValue<?>>> coderMap)
      Returns a pair function to convert value to bytes via coder.
      Parameters:
      coderMap - - mapping between TupleTag and a coder
      Returns:
      a pair function to convert value to bytes via coder
    • getTupleTagDecodeFunction

      public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<TupleTag<?>,ValueAndCoderLazySerializable<WindowedValue<?>>>,TupleTag<?>,WindowedValue<?>> getTupleTagDecodeFunction(Map<TupleTag<?>,Coder<WindowedValue<?>>> coderMap)
      Returns a pair function to convert bytes to value via coder.
      Parameters:
      coderMap - - mapping between TupleTag and a coder
      Returns:
      a pair function to convert bytes to value via coder