Class PipelineTranslator
java.lang.Object
org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator
- Direct Known Subclasses:
PipelineTranslatorBatch
The pipeline translator translates a Beam
Pipeline
into a Spark correspondence, that can
then be evaluated.
The translation involves traversing the hierarchy of a pipeline multiple times:
- Detect if
streaming
mode is required. - Identify datasets that are repeatedly used as input and should be cached.
- And finally, translate each primitive or composite
PTransform
that isknown
andsupported
into its Spark correspondence. If a composite is not supported, it will be expanded further into its parts and translated then.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interface
Shared, mutable state during the translation of a pipeline and omitted afterwards.static interface
Unresolved translation, allowing to optimize the generated Spark DAG. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
detectStreamingMode
(Pipeline pipeline, StreamingOptions options) Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and updateStreamingOptions
accordingly.protected abstract <InT extends PInput,
OutT extends POutput, TransformT extends PTransform<InT, OutT>>
TransformTranslator<InT, OutT, TransformT> getTransformTranslator
(TransformT transform) Returns aTransformTranslator
for the givenPTransform
if known.static void
replaceTransforms
(Pipeline pipeline, StreamingOptions options) translate
(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options) Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.
-
Constructor Details
-
PipelineTranslator
public PipelineTranslator()
-
-
Method Details
-
replaceTransforms
-
detectStreamingMode
Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and updateStreamingOptions
accordingly. -
getTransformTranslator
@Nullable protected abstract <InT extends PInput,OutT extends POutput, TransformTranslator<InT,TransformT extends PTransform<InT, OutT>> OutT, getTransformTranslatorTransformT> (TransformT transform) Returns aTransformTranslator
for the givenPTransform
if known. -
translate
public EvaluationContext translate(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options) Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.Note, in some cases this involves the early evaluation of some parts of the pipeline. For example, in order to use a side-input
PCollectionView
in a translation the corresponding SparkDataset
might have to be collected and broadcasted to be able to continue with the translation.- Returns:
- The result of the translation is an
EvaluationContext
that can trigger the evaluation of the Spark pipeline.
-