Package org.apache.beam.runners.spark
Class SparkRunner
The SparkRunner translate operations defined on a pipeline to a representation executable by
 Spark, and then submitting the job to Spark to be executed. If we wanted to run a Beam pipeline
 with the default options of a single threaded spark instance in local mode, we would do the
 following:
 
Pipeline p = [logic for pipeline creation] SparkPipelineResult result =
 (SparkPipelineResult) p.run(); 
 
To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:
Pipeline p = [logic for pipeline creation] SparkPipelineOptions options =
 SparkPipelineOptionsFactory.create(); options.setSparkMaster("spark://host:port");
 SparkPipelineResult result = (SparkPipelineResult) p.run(); 
- 
Nested Class Summary
Nested Classes - 
Method Summary
Modifier and TypeMethodDescriptionstatic SparkRunnercreate()Creates and returns a new SparkRunner with default options.static SparkRunnercreate(SparkPipelineOptions options) Creates and returns a new SparkRunner with specified options.static SparkRunnerfromOptions(PipelineOptions options) Creates and returns a new SparkRunner with specified options.static voidinitAccumulators(SparkPipelineOptions opts, org.apache.spark.api.java.JavaSparkContext jsc) Init Metrics/Aggregators accumulators.Processes the givenPipeline, potentially asynchronously, returning a runner-specific type of result.static voidupdateCacheCandidates(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext) Evaluator that update/populate the cache candidates.static voidupdateDependentTransforms(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext) Evaluator that update/populate information about dependent transforms for pCollections.Methods inherited from class org.apache.beam.sdk.PipelineRunner
run, run 
- 
Method Details
- 
create
Creates and returns a new SparkRunner with default options. In particular, against a spark instance running in local mode.- Returns:
 - A pipeline runner with default options.
 
 - 
create
Creates and returns a new SparkRunner with specified options.- Parameters:
 options- The SparkPipelineOptions to use when executing the job.- Returns:
 - A pipeline runner that will execute with specified options.
 
 - 
fromOptions
Creates and returns a new SparkRunner with specified options.- Parameters:
 options- The PipelineOptions to use when executing the job.- Returns:
 - A pipeline runner that will execute with specified options.
 
 - 
run
Description copied from class:PipelineRunnerProcesses the givenPipeline, potentially asynchronously, returning a runner-specific type of result.- Specified by:
 runin classPipelineRunner<SparkPipelineResult>
 - 
initAccumulators
public static void initAccumulators(SparkPipelineOptions opts, org.apache.spark.api.java.JavaSparkContext jsc) Init Metrics/Aggregators accumulators. This method is idempotent. - 
updateCacheCandidates
public static void updateCacheCandidates(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext) Evaluator that update/populate the cache candidates. - 
updateDependentTransforms
public static void updateDependentTransforms(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext) Evaluator that update/populate information about dependent transforms for pCollections. 
 -