Class SparkStructuredStreamingRunner
java.lang.Object
org.apache.beam.sdk.PipelineRunner<SparkStructuredStreamingPipelineResult>
org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner
public final class SparkStructuredStreamingRunner
extends PipelineRunner<SparkStructuredStreamingPipelineResult>
A Spark runner build on top of Spark's SQL Engine (Structured
Streaming framework).
This runner is experimental, its coverage of the Beam model is still partial. Due to limitations of the Structured Streaming framework (e.g. lack of support for multiple stateful operators), streaming mode is not yet supported by this runner.
The runner translates transforms defined on a Beam pipeline to Spark `Dataset` transformations (leveraging the high level Dataset API) and then submits these to Spark to be executed.
To run a Beam pipeline with the default options using Spark's local mode, we would do the following:
Pipeline p = [logic for pipeline creation]
PipelineResult result = p.run();
To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:
Pipeline p = [logic for pipeline creation]
SparkCommonPipelineOptions options = p.getOptions.as(SparkCommonPipelineOptions.class);
options.setSparkMaster("spark://host:port");
PipelineResult result = p.run();
-
Method Summary
Modifier and TypeMethodDescriptioncreate()
Creates and returns a new SparkStructuredStreamingRunner with default options.Creates and returns a new SparkStructuredStreamingRunner with specified options.fromOptions
(PipelineOptions options) Creates and returns a new SparkStructuredStreamingRunner with specified options.Processes the givenPipeline
, potentially asynchronously, returning a runner-specific type of result.Methods inherited from class org.apache.beam.sdk.PipelineRunner
run, run
-
Method Details
-
create
Creates and returns a new SparkStructuredStreamingRunner with default options. In particular, against a spark instance running in local mode.- Returns:
- A pipeline runner with default options.
-
create
public static SparkStructuredStreamingRunner create(SparkStructuredStreamingPipelineOptions options) Creates and returns a new SparkStructuredStreamingRunner with specified options.- Parameters:
options
- The SparkStructuredStreamingPipelineOptions to use when executing the job.- Returns:
- A pipeline runner that will execute with specified options.
-
fromOptions
Creates and returns a new SparkStructuredStreamingRunner with specified options.- Parameters:
options
- The PipelineOptions to use when executing the job.- Returns:
- A pipeline runner that will execute with specified options.
-
run
Description copied from class:PipelineRunner
Processes the givenPipeline
, potentially asynchronously, returning a runner-specific type of result.- Specified by:
run
in classPipelineRunner<SparkStructuredStreamingPipelineResult>
-