apache_beam.runners.runner module¶
PipelineRunner, an abstract base runner object.
- 
class 
apache_beam.runners.runner.PipelineRunner[source]¶ Bases:
objectA runner of a pipeline object.
The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.
A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.
- 
run_pipeline(pipeline)[source]¶ Execute the entire pipeline or the sub-DAG reachable from a node.
Runners should override this method.
- 
apply(transform, input)[source]¶ Runner callback for a pipeline.apply call.
Parameters: - transform – the transform to apply.
 - input – transform’s input (typically a PCollection).
 
A concrete implementation of the Runner class may want to do custom pipeline construction for a given transform. To override the behavior for a transform class Xyz, implement an apply_Xyz method with this same signature.
- 
run_transform(transform_node)[source]¶ Runner callback for a pipeline.run call.
Parameters: transform_node – transform node for the transform to run. A concrete implementation of the Runner class must implement run_Abc for some class Abc in the method resolution order for every non-composite transform Xyz in the pipeline.
- 
 
- 
class 
apache_beam.runners.runner.PipelineState[source]¶ Bases:
objectState of the Pipeline, as returned by
PipelineResult.state.This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.
- 
UNKNOWN= 'UNKNOWN'¶ 
- 
STARTING= 'STARTING'¶ 
- 
STOPPED= 'STOPPED'¶ 
- 
RUNNING= 'RUNNING'¶ 
- 
DONE= 'DONE'¶ 
- 
FAILED= 'FAILED'¶ 
- 
CANCELLED= 'CANCELLED'¶ 
- 
UPDATED= 'UPDATED'¶ 
- 
DRAINING= 'DRAINING'¶ 
- 
DRAINED= 'DRAINED'¶ 
- 
PENDING= 'PENDING'¶ 
- 
CANCELLING= 'CANCELLING'¶ 
- 
 
- 
class 
apache_beam.runners.runner.PipelineResult(state)[source]¶ Bases:
objectA
PipelineResultprovides access to info about a pipeline.- 
state¶ Return the current state of the pipeline execution.
- 
wait_until_finish(duration=None)[source]¶ Waits until the pipeline finishes and returns the final status.
Parameters: duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to
None, it will wait indefinitely until the job is finished.Raises: IOError– If there is a persistent problem getting job information.NotImplementedError– If the runner does not support this operation.
Returns: The final state of the pipeline, or
Noneon timeout.
- 
cancel()[source]¶ Cancels the pipeline execution.
Raises: IOError– If there is a persistent problem getting job information.NotImplementedError– If the runner does not support this operation.
Returns: The final state of the pipeline.
- 
metrics()[source]¶ Returns
MetricResultsobject to query metrics from the runner.Raises: NotImplementedError– If the runner does not support this operation.
-