java.lang.Object
org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator
Direct Known Subclasses:
PipelineTranslatorBatch

@Internal public abstract class PipelineTranslator extends Object
The pipeline translator translates a Beam Pipeline into a Spark correspondence, that can then be evaluated.

The translation involves traversing the hierarchy of a pipeline multiple times:

  1. Detect if streaming mode is required.
  2. Identify datasets that are repeatedly used as input and should be cached.
  3. And finally, translate each primitive or composite PTransform that is known and supported into its Spark correspondence. If a composite is not supported, it will be expanded further into its parts and translated then.
  • Constructor Details

    • PipelineTranslator

      public PipelineTranslator()
  • Method Details

    • replaceTransforms

      public static void replaceTransforms(Pipeline pipeline, StreamingOptions options)
    • detectStreamingMode

      public static void detectStreamingMode(Pipeline pipeline, StreamingOptions options)
      Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and update StreamingOptions accordingly.
    • getTransformTranslator

      @Nullable protected abstract <InT extends PInput, OutT extends POutput, TransformT extends PTransform<InT, OutT>> TransformTranslator<InT,OutT,TransformT> getTransformTranslator(TransformT transform)
      Returns a TransformTranslator for the given PTransform if known.
    • translate

      public EvaluationContext translate(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options)
      Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.

      Note, in some cases this involves the early evaluation of some parts of the pipeline. For example, in order to use a side-input PCollectionView in a translation the corresponding Spark Dataset might have to be collected and broadcasted to be able to continue with the translation.

      Returns:
      The result of the translation is an EvaluationContext that can trigger the evaluation of the Spark pipeline.