public final class SparkStructuredStreamingRunner extends PipelineRunner<SparkStructuredStreamingPipelineResult>
This runner is experimental, its coverage of the Beam model is still partial. Due to limitations of the Structured Streaming framework (e.g. lack of support for multiple stateful operators), streaming mode is not yet supported by this runner.
The runner translates transforms defined on a Beam pipeline to Spark `Dataset` transformations (leveraging the high level Dataset API) and then submits these to Spark to be executed.
To run a Beam pipeline with the default options using Spark's local mode, we would do the following:
Pipeline p = [logic for pipeline creation]
PipelineResult result = p.run();
To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:
Pipeline p = [logic for pipeline creation]
SparkCommonPipelineOptions options = p.getOptions.as(SparkCommonPipelineOptions.class);
options.setSparkMaster("spark://host:port");
PipelineResult result = p.run();
Modifier and Type | Method and Description |
---|---|
static SparkStructuredStreamingRunner |
create()
Creates and returns a new SparkStructuredStreamingRunner with default options.
|
static SparkStructuredStreamingRunner |
create(SparkStructuredStreamingPipelineOptions options)
Creates and returns a new SparkStructuredStreamingRunner with specified options.
|
static SparkStructuredStreamingRunner |
fromOptions(PipelineOptions options)
Creates and returns a new SparkStructuredStreamingRunner with specified options.
|
SparkStructuredStreamingPipelineResult |
run(Pipeline pipeline)
Processes the given
Pipeline , potentially asynchronously, returning a runner-specific
type of result. |
run, run
public static SparkStructuredStreamingRunner create()
public static SparkStructuredStreamingRunner create(SparkStructuredStreamingPipelineOptions options)
options
- The SparkStructuredStreamingPipelineOptions to use when executing the job.public static SparkStructuredStreamingRunner fromOptions(PipelineOptions options)
options
- The PipelineOptions to use when executing the job.public SparkStructuredStreamingPipelineResult run(Pipeline pipeline)
PipelineRunner
Pipeline
, potentially asynchronously, returning a runner-specific
type of result.run
in class PipelineRunner<SparkStructuredStreamingPipelineResult>