apache_beam.runners.runner module

PipelineRunner, an abstract base runner object.

class apache_beam.runners.runner.PipelineRunner[source]

Bases: object

A runner of a pipeline object.

The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.

A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.

run(transform, options=None)[source]

Run the given transform or callable with this runner.

Blocks until the pipeline is complete. See also PipelineRunner.run_async.

run_async(transform, options=None)[source]

Run the given transform or callable with this runner.

May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.

run_portable_pipeline(pipeline: org.apache.beam.model.pipeline.v1.beam_runner_api_pb2.Pipeline, options: apache_beam.options.pipeline_options.PipelineOptions) → apache_beam.runners.runner.PipelineResult[source]

Execute the entire pipeline.

Runners should override this method.

default_environment(options: apache_beam.options.pipeline_options.PipelineOptions) → apache_beam.transforms.environments.Environment[source]

Returns the default environment that should be used for this runner.

Runners may override this method to provide alternative environments.

run_pipeline(pipeline, options)[source]

Execute the entire pipeline or the sub-DAG reachable from a node.

apply(transform, input, options)[source]
apply_PTransform(transform, input, options)[source]
is_fnapi_compatible()[source]

Whether to enable the beam_fn_api experiment by default.

check_requirements(pipeline_proto: org.apache.beam.model.pipeline.v1.beam_runner_api_pb2.Pipeline, supported_requirements: Iterable[str])[source]

Check that this runner can satisfy all pipeline requirements.

class apache_beam.runners.runner.PipelineState[source]

Bases: object

State of the Pipeline, as returned by PipelineResult.state.

This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.

UNKNOWN = 'UNKNOWN'
STARTING = 'STARTING'
STOPPED = 'STOPPED'
RUNNING = 'RUNNING'
DONE = 'DONE'
FAILED = 'FAILED'
CANCELLED = 'CANCELLED'
UPDATED = 'UPDATED'
DRAINING = 'DRAINING'
DRAINED = 'DRAINED'
PENDING = 'PENDING'
CANCELLING = 'CANCELLING'
RESOURCE_CLEANING_UP = 'RESOURCE_CLEANING_UP'
UNRECOGNIZED = 'UNRECOGNIZED'
classmethod is_terminal(state)[source]
class apache_beam.runners.runner.PipelineResult(state)[source]

Bases: object

A PipelineResult provides access to info about a pipeline.

state

Return the current state of the pipeline execution.

wait_until_finish(duration=None)[source]

Waits until the pipeline finishes and returns the final status.

Parameters:

duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to None, it will wait indefinitely until the job is finished.

Raises:
  • IOError – If there is a persistent problem getting job information.
  • NotImplementedError – If the runner does not support this operation.
Returns:

The final state of the pipeline, or None on timeout.

cancel()[source]

Cancels the pipeline execution.

Raises:
  • IOError – If there is a persistent problem getting job information.
  • NotImplementedError – If the runner does not support this operation.
Returns:

The final state of the pipeline.

metrics()[source]

Returns MetricResults object to query metrics from the runner.

Raises:NotImplementedError – If the runner does not support this operation.
aggregated_values(aggregator_or_name)[source]

Return a dict of step names to values of the Aggregator.