Class SparkStructuredStreamingRunner

java.lang.Object
org.apache.beam.sdk.PipelineRunner<SparkStructuredStreamingPipelineResult>
org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner

public final class SparkStructuredStreamingRunner extends PipelineRunner<SparkStructuredStreamingPipelineResult>
A Spark runner build on top of Spark's SQL Engine (Structured Streaming framework).

This runner is experimental, its coverage of the Beam model is still partial. Due to limitations of the Structured Streaming framework (e.g. lack of support for multiple stateful operators), streaming mode is not yet supported by this runner.

The runner translates transforms defined on a Beam pipeline to Spark `Dataset` transformations (leveraging the high level Dataset API) and then submits these to Spark to be executed.

To run a Beam pipeline with the default options using Spark's local mode, we would do the following:


 Pipeline p = [logic for pipeline creation]
 PipelineResult result = p.run();
 

To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:


 Pipeline p = [logic for pipeline creation]
 SparkCommonPipelineOptions options = p.getOptions.as(SparkCommonPipelineOptions.class);
 options.setSparkMaster("spark://host:port");
 PipelineResult result = p.run();