Package org.apache.beam.runners.spark
package org.apache.beam.runners.spark
Internal implementation of the Beam runner for Apache Spark.
-
ClassDescriptionSpark runner
PipelineOptions
handles Spark execution-related configurations, such as the master address, and other user-related knobs.Returns Spark's default storage level for the Dataset or RDD API based on the respective runner.Returns the default checkpoint directory of /tmp/${job.name}.A customPipelineOptions
to work with properties related toJavaSparkContext
.Returns an empty list, to avoid handling null.Creates a job invocation to manage the Spark runner's execution of a portable pipeline.Driver program that starts a job server for the Spark runner.Spark runner-specific Configuration for the jobServer.Pipeline visitor for translating a Beam pipeline into equivalent Spark operations.Spark runnerPipelineOptions
handles Spark execution-related configurations, such as the master address, batch-interval, and other user-related knobs.Represents a Spark pipeline execution result.Runs a portable pipeline on Apache Spark.Pipeline options specific to the Spark portable runner running a streaming job.The SparkRunner translate operations defined on a pipeline to a representation executable by Spark, and then submitting the job to Spark to be executed.Evaluator on the pipeline.Pipeline runner which translates a Beam pipeline into equivalent Spark operations, without running them.PipelineResult of running aPipeline
usingSparkRunnerDebugger
UseSparkRunnerDebugger.DebugSparkPipelineResult.getDebugString()
to get aString
representation of thePipeline
translated into Spark native operations.Registers theSparkPipelineOptions
.Registers theSparkRunner
.PTransform
overrides for Spark runner.ASparkPipelineOptions
for tests.A factory to provide the default watermark to stop a pipeline that reads from an unbounded source.The SparkRunner translate operations defined on a pipeline to a representation executable by Spark, and then submitting the job to Spark to be executed.