apache_beam.runners.runner module

PipelineRunner, an abstract base runner object.

class apache_beam.runners.runner.PipelineRunner[source]

Bases: object

A runner of a pipeline object.

The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.

A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.

run(transform: PTransform, options: PipelineOptions | None = None) → PipelineResult[source]

Run the given transform or callable with this runner.

Blocks until the pipeline is complete. See also PipelineRunner.run_async.

run_async(transform: PTransform, options: PipelineOptions | None = None) → PipelineResult[source]

Run the given transform or callable with this runner.

May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.

run_portable_pipeline(pipeline: Pipeline, options: PipelineOptions) → PipelineResult[source]

Execute the entire pipeline.

Runners should override this method.

default_environment(options: PipelineOptions) → Environment[source]

Returns the default environment that should be used for this runner.

Runners may override this method to provide alternative environments.

run_pipeline(pipeline: Pipeline, options: PipelineOptions) → PipelineResult[source]: Execute the entire pipeline or the sub-DAG reachable from a node.

apply(transform: PTransform, input: pvalue.PValue | None, options: PipelineOptions)[source]

apply_PTransform(transform, input, options)[source]

is_fnapi_compatible()[source]: Whether to enable the beam_fn_api experiment by default.

check_requirements(pipeline_proto: Pipeline, supported_requirements: Iterable[str])[source]: Check that this runner can satisfy all pipeline requirements.

default_pickle_library_override()[source]: Default pickle library, can be overridden by runner implementation.

class apache_beam.runners.runner.PipelineState[source]

Bases: object

State of the Pipeline, as returned by PipelineResult.state.

This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.

UNKNOWN = 'UNKNOWN'

STARTING = 'STARTING'

STOPPED = 'STOPPED'

RUNNING = 'RUNNING'

DONE = 'DONE'

FAILED = 'FAILED'

CANCELLED = 'CANCELLED'

UPDATED = 'UPDATED'

DRAINING = 'DRAINING'

DRAINED = 'DRAINED'

PENDING = 'PENDING'

CANCELLING = 'CANCELLING'

RESOURCE_CLEANING_UP = 'RESOURCE_CLEANING_UP'

UNRECOGNIZED = 'UNRECOGNIZED'

classmethod is_terminal(state)[source]

class apache_beam.runners.runner.PipelineResult(state)[source]

Bases: object

A PipelineResult provides access to info about a pipeline.

property state: Return the current state of the pipeline execution.

wait_until_finish(duration=None)[source]

Waits until the pipeline finishes and returns the final status.

Parameters:

duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to None, it will wait indefinitely until the job is finished.

Raises:

IOError – If there is a persistent problem getting job information.
NotImplementedError – If the runner does not support this operation.

Returns:

The final state of the pipeline, or None on timeout.

cancel()[source]

Cancels the pipeline execution.

Raises:

IOError – If there is a persistent problem getting job information.
NotImplementedError – If the runner does not support this operation.

Returns:

The final state of the pipeline.

metrics()[source]

Returns MetricResults object to query metrics from the runner.

Raises:: NotImplementedError – If the runner does not support this operation.

aggregated_values(aggregator_or_name)[source]: Return a dict of step names to values of the Aggregator.