PipelineTranslator (Apache Beam 2.67.0)

java.lang.Object

org.apache.beam.runners.spark.structuredstreaming.translation.PipelineTranslator

Direct Known Subclasses:: PipelineTranslatorBatch

@Internal public abstract class PipelineTranslator extends Object

The pipeline translator translates a Beam Pipeline into a Spark correspondence, that can then be evaluated.

The translation involves traversing the hierarchy of a pipeline multiple times:

Detect if streaming mode is required.
Identify datasets that are repeatedly used as input and should be cached.
And finally, translate each primitive or composite PTransform that is known and supported into its Spark correspondence. If a composite is not supported, it will be expanded further into its parts and translated then.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static interface

PipelineTranslator.TranslationState

Shared, mutable state during the translation of a pipeline and omitted afterwards.

static interface

PipelineTranslator.UnresolvedTranslation<InT,T>

Unresolved translation, allowing to optimize the generated Spark DAG.
Constructor Summary

Constructors

Constructor

Description

PipelineTranslator()
Method Summary

Modifier and Type

Method

Description

static void

detectStreamingMode(Pipeline pipeline, StreamingOptions options)

Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and update StreamingOptions accordingly.

protected abstract <InT extends PInput, OutT extends POutput, TransformT extends PTransform<InT, OutT>> TransformTranslator<InT,OutT,TransformT>

getTransformTranslator(TransformT transform)

Returns a TransformTranslator for the given PTransform if known.

static void

replaceTransforms(Pipeline pipeline, StreamingOptions options)

EvaluationContext

translate(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options)

Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- PipelineTranslator
  
  public PipelineTranslator()
Method Details
- replaceTransforms
  
  public static void replaceTransforms(Pipeline pipeline, StreamingOptions options)
- detectStreamingMode
  
  public static void detectStreamingMode(Pipeline pipeline, StreamingOptions options)
  
  Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline translation and update StreamingOptions accordingly.
- getTransformTranslator
  
  @Nullable protected abstract <InT extends PInput, OutT extends POutput, TransformT extends PTransform<InT, OutT>> TransformTranslator<InT,OutT,TransformT> getTransformTranslator(TransformT transform)
  
  Returns a TransformTranslator for the given PTransform if known.
- translate
  
  public EvaluationContext translate(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options)
  
  Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.
  Note, in some cases this involves the early evaluation of some parts of the pipeline. For example, in order to use a side-input PCollectionView in a translation the corresponding Spark Dataset might have to be collected and broadcasted to be able to continue with the translation.
  
  Returns:
  
  The result of the translation is an EvaluationContext that can trigger the evaluation of the Spark pipeline.