apache_beam.runners.runner module¶

PipelineRunner, an abstract base runner object.

class apache_beam.runners.runner.PipelineRunner[source]¶

Bases: object

A runner of a pipeline object.

The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.

A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.

run(transform: PTransform, options: Optional[apache_beam.options.pipeline_options.PipelineOptions] = None) → PipelineResult[source]¶

Run the given transform or callable with this runner.

Blocks until the pipeline is complete. See also PipelineRunner.run_async.

run_async(transform: PTransform, options: Optional[apache_beam.options.pipeline_options.PipelineOptions] = None) → PipelineResult[source]¶

Run the given transform or callable with this runner.

May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.

run_portable_pipeline(pipeline: org.apache.beam.model.pipeline.v1.beam_runner_api_pb2.Pipeline, options: apache_beam.options.pipeline_options.PipelineOptions) → apache_beam.runners.runner.PipelineResult[source]¶

Execute the entire pipeline.

Runners should override this method.

default_environment(options: apache_beam.options.pipeline_options.PipelineOptions) → apache_beam.transforms.environments.Environment[source]¶

Returns the default environment that should be used for this runner.

Runners may override this method to provide alternative environments.

run_pipeline(pipeline: Pipeline, options: apache_beam.options.pipeline_options.PipelineOptions) → PipelineResult[source]¶: Execute the entire pipeline or the sub-DAG reachable from a node.

apply(transform: PTransform, input: Optional[pvalue.PValue], options: apache_beam.options.pipeline_options.PipelineOptions)[source]¶

apply_PTransform(transform, input, options)[source]¶

is_fnapi_compatible()[source]¶: Whether to enable the beam_fn_api experiment by default.

check_requirements(pipeline_proto: org.apache.beam.model.pipeline.v1.beam_runner_api_pb2.Pipeline, supported_requirements: Iterable[str])[source]¶: Check that this runner can satisfy all pipeline requirements.

class apache_beam.runners.runner.PipelineState[source]¶

Bases: object

State of the Pipeline, as returned by PipelineResult.state.

This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.

UNKNOWN = 'UNKNOWN'¶

STARTING = 'STARTING'¶

STOPPED = 'STOPPED'¶

RUNNING = 'RUNNING'¶

DONE = 'DONE'¶

FAILED = 'FAILED'¶

CANCELLED = 'CANCELLED'¶

UPDATED = 'UPDATED'¶

DRAINING = 'DRAINING'¶

DRAINED = 'DRAINED'¶

PENDING = 'PENDING'¶

CANCELLING = 'CANCELLING'¶

RESOURCE_CLEANING_UP = 'RESOURCE_CLEANING_UP'¶

UNRECOGNIZED = 'UNRECOGNIZED'¶

classmethod is_terminal(state)[source]¶

class apache_beam.runners.runner.PipelineResult(state)[source]¶

Bases: object

A PipelineResult provides access to info about a pipeline.

state¶: Return the current state of the pipeline execution.

wait_until_finish(duration=None)[source]¶

Waits until the pipeline finishes and returns the final status.

Parameters:	duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to `None`, it will wait indefinitely until the job is finished.
Raises:	`IOError` – If there is a persistent problem getting job information. `NotImplementedError` – If the runner does not support this operation.
Returns:	The final state of the pipeline, or `None` on timeout.

cancel()[source]¶

Cancels the pipeline execution.

Raises:	`IOError` – If there is a persistent problem getting job information. `NotImplementedError` – If the runner does not support this operation.
Returns:	The final state of the pipeline.

metrics()[source]¶

Returns MetricResults object to query metrics from the runner.

Raises:	`NotImplementedError` – If the runner does not support this operation.

aggregated_values(aggregator_or_name)[source]¶: Return a dict of step names to values of the Aggregator.