@Internal public abstract class PipelineTranslator extends java.lang.Object
Pipeline into a Spark correspondence, that can
then be evaluated.
The translation involves traversing the hierarchy of a pipeline multiple times:
streaming mode is required.
PTransform that is known and supported into
its Spark correspondence. If a composite is not supported, it will be expanded further into
its parts and translated then.
| Modifier and Type | Class and Description |
|---|---|
static interface |
PipelineTranslator.TranslationState
Shared, mutable state during the translation of a pipeline and omitted afterwards.
|
static interface |
PipelineTranslator.UnresolvedTranslation<InT,T>
Unresolved translation, allowing to optimize the generated Spark DAG.
|
| Constructor and Description |
|---|
PipelineTranslator() |
| Modifier and Type | Method and Description |
|---|---|
static void |
detectStreamingMode(Pipeline pipeline,
StreamingOptions options)
Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline
translation and update
StreamingOptions accordingly. |
protected abstract <InT extends PInput,OutT extends POutput,TransformT extends PTransform<InT,OutT>> |
getTransformTranslator(TransformT transform)
Returns a
TransformTranslator for the given PTransform if known. |
static void |
replaceTransforms(Pipeline pipeline,
StreamingOptions options) |
EvaluationContext |
translate(Pipeline pipeline,
org.apache.spark.sql.SparkSession session,
SparkCommonPipelineOptions options)
Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.
|
public static void replaceTransforms(Pipeline pipeline, StreamingOptions options)
public static void detectStreamingMode(Pipeline pipeline, StreamingOptions options)
StreamingOptions accordingly.@Nullable protected abstract <InT extends PInput,OutT extends POutput,TransformT extends PTransform<InT,OutT>> TransformTranslator<InT,OutT,TransformT> getTransformTranslator(TransformT transform)
TransformTranslator for the given PTransform if known.public EvaluationContext translate(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options)
Note, in some cases this involves the early evaluation of some parts of the pipeline. For
example, in order to use a side-input PCollectionView in a translation the corresponding Spark Dataset might have to be collected and
broadcasted to be able to continue with the translation.
EvaluationContext that can trigger the
evaluation of the Spark pipeline.