apache_beam.runners.runner module¶
PipelineRunner, an abstract base runner object.
-
class
apache_beam.runners.runner.
PipelineRunner
[source]¶ Bases:
object
A runner of a pipeline object.
The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.
A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.
-
run
(transform: PTransform, options: Optional[apache_beam.options.pipeline_options.PipelineOptions] = None) → PipelineResult[source]¶ Run the given transform or callable with this runner.
Blocks until the pipeline is complete. See also PipelineRunner.run_async.
-
run_async
(transform: PTransform, options: Optional[apache_beam.options.pipeline_options.PipelineOptions] = None) → PipelineResult[source]¶ Run the given transform or callable with this runner.
May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.
-
run_portable_pipeline
(pipeline: org.apache.beam.model.pipeline.v1.beam_runner_api_pb2.Pipeline, options: apache_beam.options.pipeline_options.PipelineOptions) → apache_beam.runners.runner.PipelineResult[source]¶ Execute the entire pipeline.
Runners should override this method.
-
default_environment
(options: apache_beam.options.pipeline_options.PipelineOptions) → apache_beam.transforms.environments.Environment[source]¶ Returns the default environment that should be used for this runner.
Runners may override this method to provide alternative environments.
-
run_pipeline
(pipeline: Pipeline, options: apache_beam.options.pipeline_options.PipelineOptions) → PipelineResult[source]¶ Execute the entire pipeline or the sub-DAG reachable from a node.
-
-
class
apache_beam.runners.runner.
PipelineState
[source]¶ Bases:
object
State of the Pipeline, as returned by
PipelineResult.state
.This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.
-
UNKNOWN
= 'UNKNOWN'¶
-
STARTING
= 'STARTING'¶
-
STOPPED
= 'STOPPED'¶
-
RUNNING
= 'RUNNING'¶
-
DONE
= 'DONE'¶
-
FAILED
= 'FAILED'¶
-
CANCELLED
= 'CANCELLED'¶
-
UPDATED
= 'UPDATED'¶
-
DRAINING
= 'DRAINING'¶
-
DRAINED
= 'DRAINED'¶
-
PENDING
= 'PENDING'¶
-
CANCELLING
= 'CANCELLING'¶
-
RESOURCE_CLEANING_UP
= 'RESOURCE_CLEANING_UP'¶
-
UNRECOGNIZED
= 'UNRECOGNIZED'¶
-
-
class
apache_beam.runners.runner.
PipelineResult
(state)[source]¶ Bases:
object
A
PipelineResult
provides access to info about a pipeline.-
state
¶ Return the current state of the pipeline execution.
-
wait_until_finish
(duration=None)[source]¶ Waits until the pipeline finishes and returns the final status.
Parameters: duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to
None
, it will wait indefinitely until the job is finished.Raises: IOError
– If there is a persistent problem getting job information.NotImplementedError
– If the runner does not support this operation.
Returns: The final state of the pipeline, or
None
on timeout.
-
cancel
()[source]¶ Cancels the pipeline execution.
Raises: IOError
– If there is a persistent problem getting job information.NotImplementedError
– If the runner does not support this operation.
Returns: The final state of the pipeline.
-
metrics
()[source]¶ Returns
MetricResults
object to query metrics from the runner.Raises: NotImplementedError
– If the runner does not support this operation.
-