apache_beam.runners.runner module¶
PipelineRunner, an abstract base runner object.
- 
class apache_beam.runners.runner.PipelineRunner[source]¶
- Bases: - future.types.newobject.newobject- A runner of a pipeline object. - The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node. - A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint. - 
run(transform, options=None)[source]¶
- Run the given transform or callable with this runner. - Blocks until the pipeline is complete. See also PipelineRunner.run_async. 
 - 
run_async(transform, options=None)[source]¶
- Run the given transform or callable with this runner. - May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion. 
 - 
run_pipeline(pipeline)[source]¶
- Execute the entire pipeline or the sub-DAG reachable from a node. - Runners should override this method. 
 - 
apply(transform, input)[source]¶
- Runner callback for a pipeline.apply call. - Parameters: - transform – the transform to apply.
- input – transform’s input (typically a PCollection).
 - A concrete implementation of the Runner class may want to do custom pipeline construction for a given transform. To override the behavior for a transform class Xyz, implement an apply_Xyz method with this same signature. 
 - 
run_transform(transform_node)[source]¶
- Runner callback for a pipeline.run call. - Parameters: - transform_node – transform node for the transform to run. - A concrete implementation of the Runner class must implement run_Abc for some class Abc in the method resolution order for every non-composite transform Xyz in the pipeline. 
 
- 
- 
class apache_beam.runners.runner.PipelineState[source]¶
- Bases: - future.types.newobject.newobject- State of the Pipeline, as returned by - PipelineResult.state.- This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum. - 
UNKNOWN= 'UNKNOWN'¶
 - 
STARTING= 'STARTING'¶
 - 
STOPPED= 'STOPPED'¶
 - 
RUNNING= 'RUNNING'¶
 - 
DONE= 'DONE'¶
 - 
FAILED= 'FAILED'¶
 - 
CANCELLED= 'CANCELLED'¶
 - 
UPDATED= 'UPDATED'¶
 - 
DRAINING= 'DRAINING'¶
 - 
DRAINED= 'DRAINED'¶
 - 
PENDING= 'PENDING'¶
 - 
CANCELLING= 'CANCELLING'¶
 
- 
- 
class apache_beam.runners.runner.PipelineResult(state)[source]¶
- Bases: - future.types.newobject.newobject- A - PipelineResultprovides access to info about a pipeline.- 
state¶
- Return the current state of the pipeline execution. 
 - 
wait_until_finish(duration=None)[source]¶
- Waits until the pipeline finishes and returns the final status. - Parameters: - duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to - None, it will wait indefinitely until the job is finished.- Raises: - IOError– If there is a persistent problem getting job information.
- NotImplementedError– If the runner does not support this operation.
 - Returns: - The final state of the pipeline, or - Noneon timeout.
 - 
cancel()[source]¶
- Cancels the pipeline execution. - Raises: - IOError– If there is a persistent problem getting job information.
- NotImplementedError– If the runner does not support this operation.
 - Returns: - The final state of the pipeline. 
 - 
metrics()[source]¶
- Returns - MetricResultsobject to query metrics from the runner.- Raises: - NotImplementedError– If the runner does not support this operation.
 
-