org.apache.beam.sdk.PipelineRunner<SparkPipelineResult>

org.apache.beam.runners.spark.SparkRunner

public final class SparkRunner extends PipelineRunner<SparkPipelineResult>

The SparkRunner translate operations defined on a pipeline to a representation executable by Spark, and then submitting the job to Spark to be executed. If we wanted to run a Beam pipeline with the default options of a single threaded spark instance in local mode, we would do the following:

Pipeline p = [logic for pipeline creation] SparkPipelineResult result = (SparkPipelineResult) p.run();

To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:

Pipeline p = [logic for pipeline creation] SparkPipelineOptions options = SparkPipelineOptionsFactory.create(); options.setSparkMaster("spark://host:port"); SparkPipelineResult result = (SparkPipelineResult) p.run();

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

SparkRunner.Evaluator

Evaluator on the pipeline.
Method Summary

Modifier and Type

Method

Description

static SparkRunner

create()

Creates and returns a new SparkRunner with default options.

static SparkRunner

create(SparkPipelineOptions options)

Creates and returns a new SparkRunner with specified options.

static SparkRunner

fromOptions(PipelineOptions options)

Creates and returns a new SparkRunner with specified options.

static void

initAccumulators(SparkPipelineOptions opts, org.apache.spark.api.java.JavaSparkContext jsc)

Init Metrics/Aggregators accumulators.

SparkPipelineResult

run(Pipeline pipeline)

Processes the given Pipeline, potentially asynchronously, returning a runner-specific type of result.

static void

updateCacheCandidates(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext)

Evaluator that update/populate the cache candidates.

static void

updateDependentTransforms(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext)

Evaluator that update/populate information about dependent transforms for pCollections.

Methods inherited from class org.apache.beam.sdk.PipelineRunner
run, run

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- create
  
  public static SparkRunner create()
  
  Creates and returns a new SparkRunner with default options. In particular, against a spark instance running in local mode.
  
  Returns:
  
  A pipeline runner with default options.
- create
  
  public static SparkRunner create(SparkPipelineOptions options)
  
  Creates and returns a new SparkRunner with specified options.
  
  Parameters:
  
  options - The SparkPipelineOptions to use when executing the job.
  
  Returns:
  
  A pipeline runner that will execute with specified options.
- fromOptions
  
  public static SparkRunner fromOptions(PipelineOptions options)
  
  Creates and returns a new SparkRunner with specified options.
  
  Parameters:
  
  options - The PipelineOptions to use when executing the job.
  
  Returns:
  
  A pipeline runner that will execute with specified options.
- run
  
  public SparkPipelineResult run(Pipeline pipeline)
  
  Description copied from class: PipelineRunner
  
  Processes the given Pipeline, potentially asynchronously, returning a runner-specific type of result.
  
  Specified by:
  
  run in class PipelineRunner<SparkPipelineResult>
- initAccumulators
  
  public static void initAccumulators(SparkPipelineOptions opts, org.apache.spark.api.java.JavaSparkContext jsc)
  
  Init Metrics/Aggregators accumulators. This method is idempotent.
- updateCacheCandidates
  
  public static void updateCacheCandidates(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext)
  
  Evaluator that update/populate the cache candidates.
- updateDependentTransforms
  
  public static void updateDependentTransforms(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext)
  
  Evaluator that update/populate information about dependent transforms for pCollections.

Class SparkRunner

Nested Class Summary

Method Summary

Methods inherited from class org.apache.beam.sdk.PipelineRunner

Methods inherited from class java.lang.Object

Method Details

create

create

fromOptions

run

initAccumulators

updateCacheCandidates

updateDependentTransforms