apache_beam.runners.runner module¶
PipelineRunner, an abstract base runner object.
- 
class 
apache_beam.runners.runner.PipelineRunner[source]¶ Bases:
objectA runner of a pipeline object.
The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.
A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.
- 
run(transform, options=None)[source]¶ Run the given transform or callable with this runner.
Blocks until the pipeline is complete. See also PipelineRunner.run_async.
- 
run_async(transform, options=None)[source]¶ Run the given transform or callable with this runner.
May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.
- 
run_portable_pipeline(pipeline: org.apache.beam.model.pipeline.v1.beam_runner_api_pb2.Pipeline, options: apache_beam.options.pipeline_options.PipelineOptions) → apache_beam.runners.runner.PipelineResult[source]¶ Execute the entire pipeline.
Runners should override this method.
- 
default_environment(options: apache_beam.options.pipeline_options.PipelineOptions) → apache_beam.transforms.environments.Environment[source]¶ Returns the default environment that should be used for this runner.
Runners may override this method to provide alternative environments.
- 
 
- 
class 
apache_beam.runners.runner.PipelineState[source]¶ Bases:
objectState of the Pipeline, as returned by
PipelineResult.state.This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.
- 
UNKNOWN= 'UNKNOWN'¶ 
- 
STARTING= 'STARTING'¶ 
- 
STOPPED= 'STOPPED'¶ 
- 
RUNNING= 'RUNNING'¶ 
- 
DONE= 'DONE'¶ 
- 
FAILED= 'FAILED'¶ 
- 
CANCELLED= 'CANCELLED'¶ 
- 
UPDATED= 'UPDATED'¶ 
- 
DRAINING= 'DRAINING'¶ 
- 
DRAINED= 'DRAINED'¶ 
- 
PENDING= 'PENDING'¶ 
- 
CANCELLING= 'CANCELLING'¶ 
- 
RESOURCE_CLEANING_UP= 'RESOURCE_CLEANING_UP'¶ 
- 
UNRECOGNIZED= 'UNRECOGNIZED'¶ 
- 
 
- 
class 
apache_beam.runners.runner.PipelineResult(state)[source]¶ Bases:
objectA
PipelineResultprovides access to info about a pipeline.- 
state¶ Return the current state of the pipeline execution.
- 
wait_until_finish(duration=None)[source]¶ Waits until the pipeline finishes and returns the final status.
Parameters: duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to
None, it will wait indefinitely until the job is finished.Raises: IOError– If there is a persistent problem getting job information.NotImplementedError– If the runner does not support this operation.
Returns: The final state of the pipeline, or
Noneon timeout.
- 
cancel()[source]¶ Cancels the pipeline execution.
Raises: IOError– If there is a persistent problem getting job information.NotImplementedError– If the runner does not support this operation.
Returns: The final state of the pipeline.
- 
metrics()[source]¶ Returns
MetricResultsobject to query metrics from the runner.Raises: NotImplementedError– If the runner does not support this operation.
-