Package org.apache.beam.runners.spark
package org.apache.beam.runners.spark
Internal implementation of the Beam runner for Apache Spark.
-
ClassDescriptionSpark runner
PipelineOptionshandles Spark execution-related configurations, such as the master address, and other user-related knobs.Returns Spark's default storage level for the Dataset or RDD API based on the respective runner.Returns the default checkpoint directory of /tmp/${job.name}.A customPipelineOptionsto work with properties related toJavaSparkContext.Returns an empty list, to avoid handling null.Creates a job invocation to manage the Spark runner's execution of a portable pipeline.Driver program that starts a job server for the Spark runner.Spark runner-specific Configuration for the jobServer.Pipeline visitor for translating a Beam pipeline into equivalent Spark operations.Spark runnerPipelineOptionshandles Spark execution-related configurations, such as the master address, batch-interval, and other user-related knobs.Represents a Spark pipeline execution result.Runs a portable pipeline on Apache Spark.Pipeline options specific to the Spark portable runner running a streaming job.The SparkRunner translate operations defined on a pipeline to a representation executable by Spark, and then submitting the job to Spark to be executed.Evaluator on the pipeline.Pipeline runner which translates a Beam pipeline into equivalent Spark operations, without running them.PipelineResult of running aPipelineusingSparkRunnerDebuggerUseSparkRunnerDebugger.DebugSparkPipelineResult.getDebugString()to get aStringrepresentation of thePipelinetranslated into Spark native operations.Registers theSparkPipelineOptions.Registers theSparkRunner.PTransformoverrides for Spark runner.ASparkPipelineOptionsfor tests.A factory to provide the default watermark to stop a pipeline that reads from an unbounded source.The SparkRunner translate operations defined on a pipeline to a representation executable by Spark, and then submitting the job to Spark to be executed.