Class TranslationUtils
java.lang.Object
org.apache.beam.runners.spark.translation.TranslationUtils
A set of utilities to help translating Beam transformations into Spark transformations.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
A SparkCombineFn function applied to grouped KVs.static final class
A utility class to filterTupleTag
s. -
Method Summary
Modifier and TypeMethodDescriptionstatic void
checkpointIfNeeded
(org.apache.spark.streaming.dstream.DStream<?> dStream, org.apache.beam.runners.core.construction.SerializablePipelineOptions options) Checkpoints the given DStream if checkpointing is enabled in the pipeline options.static <T1,
T2> org.apache.spark.streaming.api.java.JavaDStream <T2> dStreamValues
(org.apache.spark.streaming.api.java.JavaPairDStream<T1, T2> pairDStream) Transform a pair stream into a value stream.static <T> org.apache.spark.api.java.function.VoidFunction
<T> static <InputT,
OutputT>
org.apache.spark.api.java.function.FlatMapFunction<Iterator<InputT>, OutputT> functionToFlatMapFunction
(org.apache.spark.api.java.function.Function<InputT, OutputT> func) static Long
getBatchDuration
(org.apache.beam.runners.core.construction.SerializablePipelineOptions options) Retrieves the batch duration in milliseconds from Spark pipeline options.static Map
<TupleTag<?>, KV<WindowingStrategy<?, ?>, SideInputBroadcast<?>>> getSideInputs
(Iterable<PCollectionView<?>> views, org.apache.spark.api.java.JavaSparkContext context, SparkPCollectionView pviews) Create SideInputs as Broadcast variables.static Map
<TupleTag<?>, Coder<WindowedValue<?>>> getTupleTagCoders
(Map<TupleTag<?>, PCollection<?>> outputs) Utility to get mapping between TupleTag and a coder.static org.apache.spark.api.java.function.PairFunction
<scala.Tuple2<TupleTag<?>, ValueAndCoderLazySerializable<WindowedValue<?>>>, TupleTag<?>, WindowedValue<?>> getTupleTagDecodeFunction
(Map<TupleTag<?>, Coder<WindowedValue<?>>> coderMap) Returns a pair function to convert bytes to value via coder.static org.apache.spark.api.java.function.PairFunction
<scala.Tuple2<TupleTag<?>, WindowedValue<?>>, TupleTag<?>, ValueAndCoderLazySerializable<WindowedValue<?>>> getTupleTagEncodeFunction
(Map<TupleTag<?>, Coder<WindowedValue<?>>> coderMap) Returns a pair function to convert value to bytes via coder.static boolean
hasEventTimers
(DoFn<?, ?> doFn) Checks if the given DoFn uses event time timers.static boolean
Checks if the given DoFn uses any timers.static <T,
K, V> org.apache.spark.api.java.function.PairFlatMapFunction <Iterator<T>, K, V> pairFunctionToPairFlatMapFunction
(org.apache.spark.api.java.function.PairFunction<T, K, V> pairFunction) static void
rejectStateAndTimers
(DoFn<?, ?> doFn) Reject state and timersDoFn
.static <T,
W extends BoundedWindow>
booleanskipAssignWindows
(Window.Assign<T> transform, EvaluationContext context) Checks if the window transformation should be applied or skipped.static <K,
V> org.apache.spark.api.java.function.PairFunction <WindowedValue<KV<K, V>>, ByteArray, WindowedValue<KV<K, V>>> toPairByKeyInWindowedValue
(Coder<K> keyCoder) Extract key from aWindowedValue
KV
into a pair.KV
to pair flatmap function.static <K,
V> org.apache.spark.api.java.function.PairFunction <KV<K, V>, K, V> KV
to pair function.
-
Method Details
-
skipAssignWindows
public static <T,W extends BoundedWindow> boolean skipAssignWindows(Window.Assign<T> transform, EvaluationContext context) Checks if the window transformation should be applied or skipped.Avoid running assign windows if both source and destination are global window or if the user has not specified the WindowFn (meaning they are just messing with triggering or allowed lateness).
- Type Parameters:
T
- PCollection type.W
-BoundedWindow
type.- Parameters:
transform
- TheWindow.Assign
transformation.context
- TheEvaluationContext
.- Returns:
- if to apply the transformation.
-
dStreamValues
public static <T1,T2> org.apache.spark.streaming.api.java.JavaDStream<T2> dStreamValues(org.apache.spark.streaming.api.java.JavaPairDStream<T1, T2> pairDStream) Transform a pair stream into a value stream. -
toPairFunction
KV
to pair function. -
toPairFlatMapFunction
public static <K,V> org.apache.spark.api.java.function.PairFlatMapFunction<Iterator<KV<K,V>>, toPairFlatMapFunction()K, V> KV
to pair flatmap function. -
toPairByKeyInWindowedValue
public static <K,V> org.apache.spark.api.java.function.PairFunction<WindowedValue<KV<K,V>>, toPairByKeyInWindowedValueByteArray, WindowedValue<KV<K, V>>> (Coder<K> keyCoder) Extract key from aWindowedValue
KV
into a pair. -
getSideInputs
public static Map<TupleTag<?>,KV<WindowingStrategy<?, getSideInputs?>, SideInputBroadcast<?>>> (Iterable<PCollectionView<?>> views, org.apache.spark.api.java.JavaSparkContext context, SparkPCollectionView pviews) Create SideInputs as Broadcast variables.- Parameters:
views
- ThePCollectionView
s.context
- TheJavaSparkContext
.pviews
- TheSparkPCollectionView
.- Returns:
- a map of tagged
SideInputBroadcast
s and theirWindowingStrategy
.
-
getBatchDuration
public static Long getBatchDuration(org.apache.beam.runners.core.construction.SerializablePipelineOptions options) Retrieves the batch duration in milliseconds from Spark pipeline options.- Parameters:
options
- The serializable pipeline options containing Spark-specific settings- Returns:
- The checkpoint duration in milliseconds as specified in SparkPipelineOptions
-
hasTimers
Checks if the given DoFn uses any timers.- Parameters:
doFn
- the DoFn to check for timer usage- Returns:
- true if the DoFn uses timers, false otherwise
-
hasEventTimers
Checks if the given DoFn uses event time timers.- Parameters:
doFn
- the DoFn to check for event time timer usage- Returns:
- true if the DoFn uses event time timers, false otherwise. Note: Returns false if the DoFn has no timers at all.
-
checkpointIfNeeded
public static void checkpointIfNeeded(org.apache.spark.streaming.dstream.DStream<?> dStream, org.apache.beam.runners.core.construction.SerializablePipelineOptions options) Checkpoints the given DStream if checkpointing is enabled in the pipeline options.- Parameters:
dStream
- The DStream to be checkpointedoptions
- The SerializablePipelineOptions containing configuration settings including batch duration
-
rejectStateAndTimers
Reject state and timersDoFn
.- Parameters:
doFn
- theDoFn
to possibly reject.
-
emptyVoidFunction
public static <T> org.apache.spark.api.java.function.VoidFunction<T> emptyVoidFunction() -
pairFunctionToPairFlatMapFunction
public static <T,K, org.apache.spark.api.java.function.PairFlatMapFunction<Iterator<T>,V> K, pairFunctionToPairFlatMapFunctionV> (org.apache.spark.api.java.function.PairFunction<T, K, V> pairFunction) A utility method that adaptsPairFunction
to aPairFlatMapFunction
with anIterator
input. This is particularly useful because it allows to use functions written for mapToPair functions in flatmapToPair functions.- Type Parameters:
T
- the input type.K
- the output key type.V
- the output value type.- Parameters:
pairFunction
- thePairFunction
to adapt.- Returns:
- a
PairFlatMapFunction
that accepts anIterator
as an input and applies thePairFunction
on every element.
-
functionToFlatMapFunction
public static <InputT,OutputT> org.apache.spark.api.java.function.FlatMapFunction<Iterator<InputT>,OutputT> functionToFlatMapFunction(org.apache.spark.api.java.function.Function<InputT, OutputT> func) A utility method that adaptsFunction
to aFlatMapFunction
with anIterator
input. This is particularly useful because it allows to use functions written for map functions in flatmap functions.- Type Parameters:
InputT
- the input type.OutputT
- the output type.- Parameters:
func
- theFunction
to adapt.- Returns:
- a
FlatMapFunction
that accepts anIterator
as an input and applies theFunction
on every element.
-
getTupleTagCoders
public static Map<TupleTag<?>,Coder<WindowedValue<?>>> getTupleTagCoders(Map<TupleTag<?>, PCollection<?>> outputs) Utility to get mapping between TupleTag and a coder.- Parameters:
outputs
- - A map of tuple tags and pcollections- Returns:
- mapping between TupleTag and a coder
-
getTupleTagEncodeFunction
public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<TupleTag<?>,WindowedValue<?>>, getTupleTagEncodeFunctionTupleTag<?>, ValueAndCoderLazySerializable<WindowedValue<?>>> (Map<TupleTag<?>, Coder<WindowedValue<?>>> coderMap) Returns a pair function to convert value to bytes via coder.- Parameters:
coderMap
- - mapping between TupleTag and a coder- Returns:
- a pair function to convert value to bytes via coder
-
getTupleTagDecodeFunction
public static org.apache.spark.api.java.function.PairFunction<scala.Tuple2<TupleTag<?>,ValueAndCoderLazySerializable<WindowedValue<?>>>, getTupleTagDecodeFunctionTupleTag<?>, WindowedValue<?>> (Map<TupleTag<?>, Coder<WindowedValue<?>>> coderMap) Returns a pair function to convert bytes to value via coder.- Parameters:
coderMap
- - mapping between TupleTag and a coder- Returns:
- a pair function to convert bytes to value via coder
-