org.apache.beam.sdk.PipelineRunner<SparkStructuredStreamingPipelineResult>

org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner

public final class SparkStructuredStreamingRunner extends PipelineRunner<SparkStructuredStreamingPipelineResult>

A Spark runner build on top of Spark's SQL Engine (Structured Streaming framework).

This runner is experimental, its coverage of the Beam model is still partial. Due to limitations of the Structured Streaming framework (e.g. lack of support for multiple stateful operators), streaming mode is not yet supported by this runner.

The runner translates transforms defined on a Beam pipeline to Spark `Dataset` transformations (leveraging the high level Dataset API) and then submits these to Spark to be executed.

To run a Beam pipeline with the default options using Spark's local mode, we would do the following:


 Pipeline p = [logic for pipeline creation]
 PipelineResult result = p.run();

To create a pipeline runner to run against a different spark cluster, with a custom master url we would do the following:


 Pipeline p = [logic for pipeline creation]
 SparkCommonPipelineOptions options = p.getOptions.as(SparkCommonPipelineOptions.class);
 options.setSparkMaster("spark://host:port");
 PipelineResult result = p.run();

Method Summary

Modifier and Type

Method

Description

static SparkStructuredStreamingRunner

create()

Creates and returns a new SparkStructuredStreamingRunner with default options.

static SparkStructuredStreamingRunner

create(SparkStructuredStreamingPipelineOptions options)

Creates and returns a new SparkStructuredStreamingRunner with specified options.

static SparkStructuredStreamingRunner

fromOptions(PipelineOptions options)

Creates and returns a new SparkStructuredStreamingRunner with specified options.

SparkStructuredStreamingPipelineResult

run(Pipeline pipeline)

Processes the given Pipeline, potentially asynchronously, returning a runner-specific type of result.

Methods inherited from class org.apache.beam.sdk.PipelineRunner
run, run

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- create
  
  public static SparkStructuredStreamingRunner create()
  
  Creates and returns a new SparkStructuredStreamingRunner with default options. In particular, against a spark instance running in local mode.
  
  Returns:
  
  A pipeline runner with default options.
- create
  
  public static SparkStructuredStreamingRunner create(SparkStructuredStreamingPipelineOptions options)
  
  Creates and returns a new SparkStructuredStreamingRunner with specified options.
  
  Parameters:
  
  options - The SparkStructuredStreamingPipelineOptions to use when executing the job.
  
  Returns:
  
  A pipeline runner that will execute with specified options.
- fromOptions
  
  public static SparkStructuredStreamingRunner fromOptions(PipelineOptions options)
  
  Creates and returns a new SparkStructuredStreamingRunner with specified options.
  
  Parameters:
  
  options - The PipelineOptions to use when executing the job.
  
  Returns:
  
  A pipeline runner that will execute with specified options.
- run
  
  public SparkStructuredStreamingPipelineResult run(Pipeline pipeline)
  
  Description copied from class: PipelineRunner
  
  Processes the given Pipeline, potentially asynchronously, returning a runner-specific type of result.
  
  Specified by:
  
  run in class PipelineRunner<SparkStructuredStreamingPipelineResult>

Class SparkStructuredStreamingRunner

Method Summary

Methods inherited from class org.apache.beam.sdk.PipelineRunner

Methods inherited from class java.lang.Object

Method Details

create

create

fromOptions

run