Class TranslationUtils
java.lang.Object
org.apache.beam.runners.spark.translation.TranslationUtils
A set of utilities to help translating Beam transformations into Spark transformations.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA SparkCombineFn function applied to grouped KVs.static final classA utility class to filterTupleTags. -
Method Summary
Modifier and TypeMethodDescriptionstatic voidcheckpointIfNeeded(org.apache.spark.streaming.dstream.DStream<?> dStream, org.apache.beam.runners.core.construction.SerializablePipelineOptions options) Checkpoints the given DStream if checkpointing is enabled in the pipeline options.static <T1,T2> org.apache.spark.streaming.api.java.JavaDStream <T2> dStreamValues(org.apache.spark.streaming.api.java.JavaPairDStream<T1, T2> pairDStream) Transform a pair stream into a value stream.static <T> org.apache.spark.api.java.function.VoidFunction<T> static <InputT,OutputT>
org.apache.spark.api.java.function.FlatMapFunction<Iterator<InputT>, OutputT> functionToFlatMapFunction(org.apache.spark.api.java.function.Function<InputT, OutputT> func) static LonggetBatchDuration(org.apache.beam.runners.core.construction.SerializablePipelineOptions options) Retrieves the batch duration in milliseconds from Spark pipeline options.static Map<TupleTag<?>, KV<WindowingStrategy<?, ?>, SideInputBroadcast<?>>> getSideInputs(Iterable<PCollectionView<?>> views, org.apache.spark.api.java.JavaSparkContext context, SparkPCollectionView pviews) Create SideInputs as Broadcast variables.static Map<TupleTag<?>, Coder<WindowedValue<?>>> getTupleTagCoders(Map<TupleTag<?>, PCollection<?>> outputs) Utility to get mapping between TupleTag and a coder.static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<TupleTag<?>, ValueAndCoderLazySerializable<WindowedValue<?>>>, TupleTag<?>, WindowedValue<?>> getTupleTagDecodeFunction(Map<TupleTag<?>, Coder<WindowedValue<?>>> coderMap) Returns a pair function to convert bytes to value via coder.static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<TupleTag<?>, WindowedValue<?>>, TupleTag<?>, ValueAndCoderLazySerializable<WindowedValue<?>>> getTupleTagEncodeFunction(Map<TupleTag<?>, Coder<WindowedValue<?>>> coderMap) Returns a pair function to convert value to bytes via coder.static booleanhasEventTimers(DoFn<?, ?> doFn) Checks if the given DoFn uses event time timers.static booleanChecks if the given DoFn uses any timers.static <T,K, V> org.apache.spark.api.java.function.PairFlatMapFunction <Iterator<T>, K, V> pairFunctionToPairFlatMapFunction(org.apache.spark.api.java.function.PairFunction<T, K, V> pairFunction) static voidrejectStateAndTimers(DoFn<?, ?> doFn) Reject state and timersDoFn.static <T,W extends BoundedWindow>
booleanskipAssignWindows(Window.Assign<T> transform, EvaluationContext context) Checks if the window transformation should be applied or skipped.static <K,V> org.apache.spark.api.java.function.PairFunction <WindowedValue<KV<K, V>>, ByteArray, WindowedValue<KV<K, V>>> toPairByKeyInWindowedValue(Coder<K> keyCoder) Extract key from aWindowedValueKVinto a pair.KVto pair flatmap function.static <K,V> org.apache.spark.api.java.function.PairFunction <KV<K, V>, K, V> KVto pair function.
-
Method Details
-
skipAssignWindows
public static <T,W extends BoundedWindow> boolean skipAssignWindows(Window.Assign<T> transform, EvaluationContext context) Checks if the window transformation should be applied or skipped.Avoid running assign windows if both source and destination are global window or if the user has not specified the WindowFn (meaning they are just messing with triggering or allowed lateness).
- Type Parameters:
T- PCollection type.W-BoundedWindowtype.- Parameters:
transform- TheWindow.Assigntransformation.context- TheEvaluationContext.- Returns:
- if to apply the transformation.
-
dStreamValues
public static <T1,T2> org.apache.spark.streaming.api.java.JavaDStream<T2> dStreamValues(org.apache.spark.streaming.api.java.JavaPairDStream<T1, T2> pairDStream) Transform a pair stream into a value stream. -
toPairFunction
KVto pair function. -
toPairFlatMapFunction
public static <K,V> org.apache.spark.api.java.function.PairFlatMapFunction<Iterator<KV<K,V>>, toPairFlatMapFunction()K, V> KVto pair flatmap function. -
toPairByKeyInWindowedValue
public static <K,V> org.apache.spark.api.java.function.PairFunction<WindowedValue<KV<K,V>>, toPairByKeyInWindowedValueByteArray, WindowedValue<KV<K, V>>> (Coder<K> keyCoder) Extract key from aWindowedValueKVinto a pair. -
getSideInputs
public static Map<TupleTag<?>,KV<WindowingStrategy<?, getSideInputs?>, SideInputBroadcast<?>>> (Iterable<PCollectionView<?>> views, org.apache.spark.api.java.JavaSparkContext context, SparkPCollectionView pviews) Create SideInputs as Broadcast variables.- Parameters:
views- ThePCollectionViews.context- TheJavaSparkContext.pviews- TheSparkPCollectionView.- Returns:
- a map of tagged
SideInputBroadcasts and theirWindowingStrategy.
-
getBatchDuration
public static Long getBatchDuration(org.apache.beam.runners.core.construction.SerializablePipelineOptions options) Retrieves the batch duration in milliseconds from Spark pipeline options.- Parameters:
options- The serializable pipeline options containing Spark-specific settings- Returns:
- The checkpoint duration in milliseconds as specified in SparkPipelineOptions
-
hasTimers
Checks if the given DoFn uses any timers.- Parameters:
doFn- the DoFn to check for timer usage- Returns:
- true if the DoFn uses timers, false otherwise
-
hasEventTimers
Checks if the given DoFn uses event time timers.- Parameters:
doFn- the DoFn to check for event time timer usage- Returns:
- true if the DoFn uses event time timers, false otherwise. Note: Returns false if the DoFn has no timers at all.
-
checkpointIfNeeded
public static void checkpointIfNeeded(org.apache.spark.streaming.dstream.DStream<?> dStream, org.apache.beam.runners.core.construction.SerializablePipelineOptions options) Checkpoints the given DStream if checkpointing is enabled in the pipeline options.- Parameters:
dStream- The DStream to be checkpointedoptions- The SerializablePipelineOptions containing configuration settings including batch duration
-
rejectStateAndTimers
Reject state and timersDoFn.- Parameters:
doFn- theDoFnto possibly reject.
-
emptyVoidFunction
public static <T> org.apache.spark.api.java.function.VoidFunction<T> emptyVoidFunction() -
pairFunctionToPairFlatMapFunction
public static <T,K, org.apache.spark.api.java.function.PairFlatMapFunction<Iterator<T>,V> K, pairFunctionToPairFlatMapFunctionV> (org.apache.spark.api.java.function.PairFunction<T, K, V> pairFunction) A utility method that adaptsPairFunctionto aPairFlatMapFunctionwith anIteratorinput. This is particularly useful because it allows to use functions written for mapToPair functions in flatmapToPair functions.- Type Parameters:
T- the input type.K- the output key type.V- the output value type.- Parameters:
pairFunction- thePairFunctionto adapt.- Returns:
- a
PairFlatMapFunctionthat accepts anIteratoras an input and applies thePairFunctionon every element.
-
functionToFlatMapFunction
public static <InputT,OutputT> org.apache.spark.api.java.function.FlatMapFunction<Iterator<InputT>,OutputT> functionToFlatMapFunction(org.apache.spark.api.java.function.Function<InputT, OutputT> func) A utility method that adaptsFunctionto aFlatMapFunctionwith anIteratorinput. This is particularly useful because it allows to use functions written for map functions in flatmap functions.- Type Parameters:
InputT- the input type.OutputT- the output type.- Parameters:
func- theFunctionto adapt.- Returns:
- a
FlatMapFunctionthat accepts anIteratoras an input and applies theFunctionon every element.
-
getTupleTagCoders
public static Map<TupleTag<?>,Coder<WindowedValue<?>>> getTupleTagCoders(Map<TupleTag<?>, PCollection<?>> outputs) Utility to get mapping between TupleTag and a coder.- Parameters:
outputs- - A map of tuple tags and pcollections- Returns:
- mapping between TupleTag and a coder
-
getTupleTagEncodeFunction
public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<TupleTag<?>,WindowedValue<?>>, getTupleTagEncodeFunctionTupleTag<?>, ValueAndCoderLazySerializable<WindowedValue<?>>> (Map<TupleTag<?>, Coder<WindowedValue<?>>> coderMap) Returns a pair function to convert value to bytes via coder.- Parameters:
coderMap- - mapping between TupleTag and a coder- Returns:
- a pair function to convert value to bytes via coder
-
getTupleTagDecodeFunction
public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<TupleTag<?>,ValueAndCoderLazySerializable<WindowedValue<?>>>, getTupleTagDecodeFunctionTupleTag<?>, WindowedValue<?>> (Map<TupleTag<?>, Coder<WindowedValue<?>>> coderMap) Returns a pair function to convert bytes to value via coder.- Parameters:
coderMap- - mapping between TupleTag and a coder- Returns:
- a pair function to convert bytes to value via coder
-