Package org.apache.beam.runners.spark
Class SparkRunner
The SparkRunner translate operations defined on a pipeline to a representation executable by
Spark, and then submitting the job to Spark to be executed. If we wanted to run a Beam pipeline
with the default options of a single threaded spark instance in local mode, we would do the
following:
Pipeline p = [logic for pipeline creation] SparkPipelineResult result =
(SparkPipelineResult) p.run();
To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:
Pipeline p = [logic for pipeline creation] SparkPipelineOptions options =
SparkPipelineOptionsFactory.create(); options.setSparkMaster("spark://host:port");
SparkPipelineResult result = (SparkPipelineResult) p.run();
-
Nested Class Summary
Nested Classes -
Method Summary
Modifier and TypeMethodDescriptionstatic SparkRunner
create()
Creates and returns a new SparkRunner with default options.static SparkRunner
create
(SparkPipelineOptions options) Creates and returns a new SparkRunner with specified options.static SparkRunner
fromOptions
(PipelineOptions options) Creates and returns a new SparkRunner with specified options.static void
initAccumulators
(SparkPipelineOptions opts, org.apache.spark.api.java.JavaSparkContext jsc) Init Metrics/Aggregators accumulators.Processes the givenPipeline
, potentially asynchronously, returning a runner-specific type of result.static void
updateCacheCandidates
(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext) Evaluator that update/populate the cache candidates.static void
updateDependentTransforms
(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext) Evaluator that update/populate information about dependent transforms for pCollections.Methods inherited from class org.apache.beam.sdk.PipelineRunner
run, run
-
Method Details
-
create
Creates and returns a new SparkRunner with default options. In particular, against a spark instance running in local mode.- Returns:
- A pipeline runner with default options.
-
create
Creates and returns a new SparkRunner with specified options.- Parameters:
options
- The SparkPipelineOptions to use when executing the job.- Returns:
- A pipeline runner that will execute with specified options.
-
fromOptions
Creates and returns a new SparkRunner with specified options.- Parameters:
options
- The PipelineOptions to use when executing the job.- Returns:
- A pipeline runner that will execute with specified options.
-
run
Description copied from class:PipelineRunner
Processes the givenPipeline
, potentially asynchronously, returning a runner-specific type of result.- Specified by:
run
in classPipelineRunner<SparkPipelineResult>
-
initAccumulators
public static void initAccumulators(SparkPipelineOptions opts, org.apache.spark.api.java.JavaSparkContext jsc) Init Metrics/Aggregators accumulators. This method is idempotent. -
updateCacheCandidates
public static void updateCacheCandidates(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext) Evaluator that update/populate the cache candidates. -
updateDependentTransforms
public static void updateDependentTransforms(Pipeline pipeline, SparkPipelineTranslator translator, EvaluationContext evaluationContext) Evaluator that update/populate information about dependent transforms for pCollections.
-