apache_beam.runners.runner module
PipelineRunner, an abstract base runner object.
- class apache_beam.runners.runner.PipelineRunner[source]
Bases:
object
A runner of a pipeline object.
The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.
A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.
- run(transform: PTransform, options: PipelineOptions | None = None) PipelineResult [source]
Run the given transform or callable with this runner.
Blocks until the pipeline is complete. See also PipelineRunner.run_async.
- run_async(transform: PTransform, options: PipelineOptions | None = None) PipelineResult [source]
Run the given transform or callable with this runner.
May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.
- run_portable_pipeline(pipeline: Pipeline, options: PipelineOptions) PipelineResult [source]
Execute the entire pipeline.
Runners should override this method.
- default_environment(options: PipelineOptions) Environment [source]
Returns the default environment that should be used for this runner.
Runners may override this method to provide alternative environments.
- run_pipeline(pipeline: Pipeline, options: PipelineOptions) PipelineResult [source]
Execute the entire pipeline or the sub-DAG reachable from a node.
- apply(transform: PTransform, input: pvalue.PValue | None, options: PipelineOptions)[source]
- class apache_beam.runners.runner.PipelineState[source]
Bases:
object
State of the Pipeline, as returned by
PipelineResult.state
.This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.
- UNKNOWN = 'UNKNOWN'
- STARTING = 'STARTING'
- STOPPED = 'STOPPED'
- RUNNING = 'RUNNING'
- DONE = 'DONE'
- FAILED = 'FAILED'
- CANCELLED = 'CANCELLED'
- UPDATED = 'UPDATED'
- DRAINING = 'DRAINING'
- DRAINED = 'DRAINED'
- PENDING = 'PENDING'
- CANCELLING = 'CANCELLING'
- RESOURCE_CLEANING_UP = 'RESOURCE_CLEANING_UP'
- UNRECOGNIZED = 'UNRECOGNIZED'
- class apache_beam.runners.runner.PipelineResult(state)[source]
Bases:
object
A
PipelineResult
provides access to info about a pipeline.- property state
Return the current state of the pipeline execution.
- wait_until_finish(duration=None)[source]
Waits until the pipeline finishes and returns the final status.
- Parameters:
duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to
None
, it will wait indefinitely until the job is finished.- Raises:
IOError – If there is a persistent problem getting job information.
NotImplementedError – If the runner does not support this operation.
- Returns:
The final state of the pipeline, or
None
on timeout.
- cancel()[source]
Cancels the pipeline execution.
- Raises:
IOError – If there is a persistent problem getting job information.
NotImplementedError – If the runner does not support this operation.
- Returns:
The final state of the pipeline.
- metrics()[source]
Returns
MetricResults
object to query metrics from the runner.- Raises:
NotImplementedError – If the runner does not support this operation.