org.apache.beam.runners.spark (Apache Beam 2.67.0)

package org.apache.beam.runners.spark

Internal implementation of the Beam runner for Apache Spark.

Related Packages

Package

Description

org.apache.beam.runners.spark.coders

Beam coders and coder-related utilities for running on Apache Spark.

org.apache.beam.runners.spark.io

Spark-specific transforms for I/O.

org.apache.beam.runners.spark.metrics

Provides internal utilities for implementing Beam metrics using Spark accumulators.

org.apache.beam.runners.spark.stateful

Spark-specific stateful operators.

org.apache.beam.runners.spark.structuredstreaming

Internal implementation of the Beam runner for Apache Spark.

org.apache.beam.runners.spark.translation

Internal translators for running Beam pipelines on Spark.

org.apache.beam.runners.spark.util

Internal utilities to translate Beam pipelines to Spark.
Class

Description

SparkCommonPipelineOptions

Spark runner PipelineOptions handles Spark execution-related configurations, such as the master address, and other user-related knobs.

SparkCommonPipelineOptions.StorageLevelFactory

Returns Spark's default storage level for the Dataset or RDD API based on the respective runner.

SparkCommonPipelineOptions.TmpCheckpointDirFactory

Returns the default checkpoint directory of /tmp/${job.name}.

SparkContextOptions

A custom PipelineOptions to work with properties related to JavaSparkContext.

SparkContextOptions.EmptyListenersList

Returns an empty list, to avoid handling null.

SparkJobInvoker

Creates a job invocation to manage the Spark runner's execution of a portable pipeline.

SparkJobServerDriver

Driver program that starts a job server for the Spark runner.

SparkJobServerDriver.SparkServerConfiguration

Spark runner-specific Configuration for the jobServer.

SparkNativePipelineVisitor

Pipeline visitor for translating a Beam pipeline into equivalent Spark operations.

SparkPipelineOptions

Spark runner PipelineOptions handles Spark execution-related configurations, such as the master address, batch-interval, and other user-related knobs.

SparkPipelineResult

Represents a Spark pipeline execution result.

SparkPipelineRunner

Runs a portable pipeline on Apache Spark.

SparkPortableStreamingPipelineOptions

Pipeline options specific to the Spark portable runner running a streaming job.

SparkRunner

The SparkRunner translate operations defined on a pipeline to a representation executable by Spark, and then submitting the job to Spark to be executed.

SparkRunner.Evaluator

Evaluator on the pipeline.

SparkRunnerDebugger

Pipeline runner which translates a Beam pipeline into equivalent Spark operations, without running them.

SparkRunnerDebugger.DebugSparkPipelineResult

PipelineResult of running a Pipeline using SparkRunnerDebugger Use SparkRunnerDebugger.DebugSparkPipelineResult.getDebugString() to get a String representation of the Pipeline translated into Spark native operations.

SparkRunnerRegistrar

Contains the PipelineRunnerRegistrar and PipelineOptionsRegistrar for the SparkRunner.

SparkRunnerRegistrar.Options

Registers the SparkPipelineOptions.

SparkRunnerRegistrar.Runner

Registers the SparkRunner.

SparkTransformOverrides

PTransform overrides for Spark runner.

TestSparkPipelineOptions

A SparkPipelineOptions for tests.

TestSparkPipelineOptions.DefaultStopPipelineWatermarkFactory

A factory to provide the default watermark to stop a pipeline that reads from an unbounded source.

TestSparkRunner

The SparkRunner translate operations defined on a pipeline to a representation executable by Spark, and then submitting the job to Spark to be executed.

Package org.apache.beam.runners.spark