Class PipelineTranslator
java.lang.Object
org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator
- Direct Known Subclasses:
PipelineTranslatorBatch
The pipeline translator translates a Beam
Pipeline into a Spark correspondence, that can
then be evaluated.
The translation involves traversing the hierarchy of a pipeline multiple times:
- Detect if
streamingmode is required. - Identify datasets that are repeatedly used as input and should be cached.
- And finally, translate each primitive or composite
PTransformthat isknownandsupportedinto its Spark correspondence. If a composite is not supported, it will be expanded further into its parts and translated then.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceShared, mutable state during the translation of a pipeline and omitted afterwards.static interfaceUnresolved translation, allowing to optimize the generated Spark DAG. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voiddetectStreamingMode(Pipeline pipeline, StreamingOptions options) Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and updateStreamingOptionsaccordingly.protected abstract <InT extends PInput,OutT extends POutput, TransformT extends PTransform<InT, OutT>>
TransformTranslator<InT, OutT, TransformT> getTransformTranslator(TransformT transform) Returns aTransformTranslatorfor the givenPTransformif known.static voidreplaceTransforms(Pipeline pipeline, StreamingOptions options) translate(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options) Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.
-
Constructor Details
-
PipelineTranslator
public PipelineTranslator()
-
-
Method Details
-
replaceTransforms
-
detectStreamingMode
Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and updateStreamingOptionsaccordingly. -
getTransformTranslator
@Nullable protected abstract <InT extends PInput,OutT extends POutput, TransformTranslator<InT,TransformT extends PTransform<InT, OutT>> OutT, getTransformTranslatorTransformT> (TransformT transform) Returns aTransformTranslatorfor the givenPTransformif known. -
translate
public EvaluationContext translate(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options) Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.Note, in some cases this involves the early evaluation of some parts of the pipeline. For example, in order to use a side-input
PCollectionViewin a translation the corresponding SparkDatasetmight have to be collected and broadcasted to be able to continue with the translation.- Returns:
- The result of the translation is an
EvaluationContextthat can trigger the evaluation of the Spark pipeline.
-