apache_beam.runners package¶
Subpackages¶
- apache_beam.runners.dataflow package
- apache_beam.runners.direct package
- Submodules
- apache_beam.runners.direct.bundle_factory module
- apache_beam.runners.direct.clock module
- apache_beam.runners.direct.consumer_tracking_pipeline_visitor module
- apache_beam.runners.direct.direct_metrics module
- apache_beam.runners.direct.direct_runner module
- apache_beam.runners.direct.evaluation_context module
- apache_beam.runners.direct.executor module
- apache_beam.runners.direct.helper_transforms module
- apache_beam.runners.direct.transform_evaluator module
- apache_beam.runners.direct.transform_result module
- apache_beam.runners.direct.watermark_manager module
- Module contents
Submodules¶
apache_beam.runners.common module¶
Worker operations executor.
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.runners.common.
DoFnContext
(label, element=None, state=None)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
-
element
¶
-
timestamp
¶
-
windows
¶
-
-
class
apache_beam.runners.common.
DoFnInvoker
(output_processor, signature)[source]¶ Bases:
object
An abstraction that can be used to execute DoFn methods.
A DoFnInvoker describes a particular way for invoking methods of a DoFn represented by a given DoFnSignature.
-
static
create_invoker
(output_processor, signature, context, side_inputs, input_args, input_kwargs)[source]¶ Creates a new DoFnInvoker based on given arguments.
Parameters: - signature – a DoFnSignature for the DoFn being invoked.
- context – Context to be used when invoking the DoFn (deprecated).
- side_inputs – side inputs to be used when invoking th process method.
- input_args – arguments to be used when invoking the process method
- input_kwargs – kwargs to be used when invoking the process method.
-
static
-
class
apache_beam.runners.common.
DoFnMethodWrapper
(do_fn, method_name)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
Represents a method of a DoFn object.
-
class
apache_beam.runners.common.
DoFnRunner
(fn, args, kwargs, side_inputs, windowing, context=None, tagged_receivers=None, logger=None, step_name=None, logging_context=None, state=None, scoped_metrics_container=None)[source]¶ Bases:
apache_beam.runners.common.Receiver
For internal use only; no backwards-compatibility guarantees.
A helper class for executing ParDo operations.
-
class
apache_beam.runners.common.
DoFnSignature
(do_fn)[source]¶ Bases:
object
Represents the signature of a given
DoFn
object.Signature of a
DoFn
provides a view of the properties of a givenDoFn
. Among other things, this will give an extensible way for for (1) accessing the structure of theDoFn
including methods and method parameters (2) identifying features that a givenDoFn
support, for example, whether a givenDoFn
is a SplittableDoFn
( https://s.apache.org/splittable-do-fn) (3) validating aDoFn
based on the feature set offered by it.
-
class
apache_beam.runners.common.
DoFnState
(counter_factory)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
Keeps track of state that DoFns want, currently, user counters.
-
class
apache_beam.runners.common.
LoggingContext
[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.runners.common.
PerWindowInvoker
(output_processor, signature, context, side_inputs, input_args, input_kwargs)[source]¶ Bases:
apache_beam.runners.common.DoFnInvoker
An invoker that processes elements considering windowing information.
-
class
apache_beam.runners.common.
Receiver
[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
An object that consumes a WindowedValue.
This class can be efficiently used to pass values between the sdk and worker harnesses.
-
class
apache_beam.runners.common.
SimpleInvoker
(output_processor, signature)[source]¶ Bases:
apache_beam.runners.common.DoFnInvoker
An invoker that processes elements ignoring windowing information.
apache_beam.runners.pipeline_context module¶
Utility class for serializing pipelines via the runner API.
For internal use only; no backwards-compatibility guarantees.
apache_beam.runners.runner module¶
PipelineRunner, an abstract base runner object.
-
class
apache_beam.runners.runner.
PipelineRunner
[source]¶ Bases:
object
A runner of a pipeline object.
The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.
A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.
-
apply
(transform, input)[source]¶ Runner callback for a pipeline.apply call.
Parameters: - transform – the transform to apply.
- input – transform’s input (typically a PCollection).
A concrete implementation of the Runner class may want to do custom pipeline construction for a given transform. To override the behavior for a transform class Xyz, implement an apply_Xyz method with this same signature.
-
run_transform
(transform_node)[source]¶ Runner callback for a pipeline.run call.
Parameters: transform_node – transform node for the transform to run. A concrete implementation of the Runner class must implement run_Abc for some class Abc in the method resolution order for every non-composite transform Xyz in the pipeline.
-
-
class
apache_beam.runners.runner.
PipelineState
[source]¶ Bases:
object
State of the Pipeline, as returned by PipelineResult.state.
This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.
-
CANCELLED
= 'CANCELLED'¶
-
DONE
= 'DONE'¶
-
DRAINED
= 'DRAINED'¶
-
DRAINING
= 'DRAINING'¶
-
FAILED
= 'FAILED'¶
-
RUNNING
= 'RUNNING'¶
-
STOPPED
= 'STOPPED'¶
-
UNKNOWN
= 'UNKNOWN'¶
-
UPDATED
= 'UPDATED'¶
-
-
class
apache_beam.runners.runner.
PipelineResult
(state)[source]¶ Bases:
object
A PipelineResult provides access to info about a pipeline.
-
aggregated_values
(aggregator_or_name)[source]¶ Return a dict of step names to values of the Aggregator.
-
cancel
()[source]¶ Cancels the pipeline execution.
Raises: IOError
– If there is a persistent problem getting job information.NotImplementedError
– If the runner does not support this operation.
Returns: The final state of the pipeline.
-
metrics
()[source]¶ Returns MetricsResult object to query metrics from the runner.
Raises: NotImplementedError
– If the runner does not support this operation.
-
state
¶ Return the current state of the pipeline execution.
-
wait_until_finish
(duration=None)[source]¶ Waits until the pipeline finishes and returns the final status.
Parameters: duration – The time to wait (in milliseconds) for job to finish. If it is set to None, it will wait indefinitely until the job is finished.
Raises: IOError
– If there is a persistent problem getting job information.NotImplementedError
– If the runner does not support this operation.
Returns: The final state of the pipeline, or None on timeout.
-
Module contents¶
Runner objects execute a Pipeline.
This package defines runners, which are used to execute a pipeline.