Class SparkTranslationContext

java.lang.Object
org.apache.beam.runners.spark.translation.SparkTranslationContext
Direct Known Subclasses:
SparkStreamingTranslationContext

public class SparkTranslationContext extends Object
Translation context used to lazily store Spark data sets during portable pipeline translation and compute them after translation.
  • Constructor Details

    • SparkTranslationContext

      public SparkTranslationContext(org.apache.spark.api.java.JavaSparkContext jsc, PipelineOptions options, JobInfo jobInfo)
  • Method Details

    • getSparkContext

      public org.apache.spark.api.java.JavaSparkContext getSparkContext()
    • getSerializableOptions

      public org.apache.beam.runners.core.construction.SerializablePipelineOptions getSerializableOptions()
    • pushDataset

      public void pushDataset(String pCollectionId, Dataset dataset)
      Add output of transform to context.
    • popDataset

      public Dataset popDataset(String pCollectionId)
      Retrieve the dataset for the pCollection id and remove it from the DAG's leaves.
    • computeOutputs

      public void computeOutputs()
      Compute the outputs for all RDDs that are leaves in the DAG.
    • nextSinkId

      public int nextSinkId()
      Generate a unique pCollection id number to identify runner-generated sinks.