apache_beam.runners.runner module¶

PipelineRunner, an abstract base runner object.

class apache_beam.runners.runner.PipelineRunner[source]¶

Bases: future.types.newobject.newobject

A runner of a pipeline object.

The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.

A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.

run(transform, options=None)[source]¶

Run the given transform or callable with this runner.

Blocks until the pipeline is complete. See also PipelineRunner.run_async.

run_async(transform, options=None)[source]¶

Run the given transform or callable with this runner.

May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.

run_pipeline(pipeline)[source]¶

Execute the entire pipeline or the sub-DAG reachable from a node.

Runners should override this method.

apply(transform, input)[source]¶

Runner callback for a pipeline.apply call.

Parameters:	transform – the transform to apply. input – transform’s input (typically a PCollection).

A concrete implementation of the Runner class may want to do custom pipeline construction for a given transform. To override the behavior for a transform class Xyz, implement an apply_Xyz method with this same signature.

apply_PTransform(transform, input)[source]¶

run_transform(transform_node)[source]¶

Runner callback for a pipeline.run call.

Parameters:	transform_node – transform node for the transform to run.

A concrete implementation of the Runner class must implement run_Abc for some class Abc in the method resolution order for every non-composite transform Xyz in the pipeline.

class apache_beam.runners.runner.PipelineState[source]¶

Bases: future.types.newobject.newobject

State of the Pipeline, as returned by PipelineResult.state.

This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.

UNKNOWN = 'UNKNOWN'¶

STARTING = 'STARTING'¶

STOPPED = 'STOPPED'¶

RUNNING = 'RUNNING'¶

DONE = 'DONE'¶

FAILED = 'FAILED'¶

CANCELLED = 'CANCELLED'¶

UPDATED = 'UPDATED'¶

DRAINING = 'DRAINING'¶

DRAINED = 'DRAINED'¶

PENDING = 'PENDING'¶

CANCELLING = 'CANCELLING'¶

classmethod is_terminal(state)[source]¶

class apache_beam.runners.runner.PipelineResult(state)[source]¶

Bases: future.types.newobject.newobject

A PipelineResult provides access to info about a pipeline.

state¶: Return the current state of the pipeline execution.

wait_until_finish(duration=None)[source]¶

Waits until the pipeline finishes and returns the final status.

Parameters:	duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to `None`, it will wait indefinitely until the job is finished.
Raises:	`IOError` – If there is a persistent problem getting job information. `NotImplementedError` – If the runner does not support this operation.
Returns:	The final state of the pipeline, or `None` on timeout.

cancel()[source]¶

Cancels the pipeline execution.

Raises:	`IOError` – If there is a persistent problem getting job information. `NotImplementedError` – If the runner does not support this operation.
Returns:	The final state of the pipeline.

metrics()[source]¶

Returns MetricResults object to query metrics from the runner.

Raises:	`NotImplementedError` – If the runner does not support this operation.

aggregated_values(aggregator_or_name)[source]¶: Return a dict of step names to values of the Aggregator.