@Internal public abstract class PipelineTranslator extends java.lang.Object
Pipeline
into a Spark correspondence, that can
then be evaluated.
The translation involves traversing the hierarchy of a pipeline multiple times:
streaming
mode is required.
PTransform
that is known
and supported
into
its Spark correspondence. If a composite is not supported, it will be expanded further into
its parts and translated then.
Modifier and Type | Class and Description |
---|---|
static interface |
PipelineTranslator.TranslationState
Shared, mutable state during the translation of a pipeline and omitted afterwards.
|
static interface |
PipelineTranslator.UnresolvedTranslation<InT,T>
Unresolved translation, allowing to optimize the generated Spark DAG.
|
Constructor and Description |
---|
PipelineTranslator() |
Modifier and Type | Method and Description |
---|---|
static void |
detectStreamingMode(Pipeline pipeline,
StreamingOptions options)
Analyse the pipeline to determine if we have to switch to streaming mode for the pipeline
translation and update
StreamingOptions accordingly. |
protected abstract <InT extends PInput,OutT extends POutput,TransformT extends PTransform<InT,OutT>> |
getTransformTranslator(TransformT transform)
Returns a
TransformTranslator for the given PTransform if known. |
static void |
replaceTransforms(Pipeline pipeline,
StreamingOptions options) |
EvaluationContext |
translate(Pipeline pipeline,
org.apache.spark.sql.SparkSession session,
SparkCommonPipelineOptions options)
Translates a Beam pipeline into its Spark correspondence using the Spark SQL / Dataset API.
|
public static void replaceTransforms(Pipeline pipeline, StreamingOptions options)
public static void detectStreamingMode(Pipeline pipeline, StreamingOptions options)
StreamingOptions
accordingly.@Nullable protected abstract <InT extends PInput,OutT extends POutput,TransformT extends PTransform<InT,OutT>> TransformTranslator<InT,OutT,TransformT> getTransformTranslator(TransformT transform)
TransformTranslator
for the given PTransform
if known.public EvaluationContext translate(Pipeline pipeline, org.apache.spark.sql.SparkSession session, SparkCommonPipelineOptions options)
Note, in some cases this involves the early evaluation of some parts of the pipeline. For
example, in order to use a side-input PCollectionView
in a translation the corresponding Spark Dataset
might have to be collected and
broadcasted to be able to continue with the translation.
EvaluationContext
that can trigger the
evaluation of the Spark pipeline.