apache_beam.runners package

Submodules

apache_beam.runners.common module

Worker operations executor.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.runners.common.DoFnContext(label, element=None, state=None)[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

element
set_element(windowed_value)[source]
timestamp
windows
class apache_beam.runners.common.DoFnInvoker(output_processor, signature)[source]

Bases: object

An abstraction that can be used to execute DoFn methods.

A DoFnInvoker describes a particular way for invoking methods of a DoFn represented by a given DoFnSignature.

static create_invoker(output_processor, signature, context, side_inputs, input_args, input_kwargs)[source]

Creates a new DoFnInvoker based on given arguments.

Parameters:
  • signature – a DoFnSignature for the DoFn being invoked.
  • context – Context to be used when invoking the DoFn (deprecated).
  • side_inputs – side inputs to be used when invoking th process method.
  • input_args – arguments to be used when invoking the process method
  • input_kwargs – kwargs to be used when invoking the process method.
invoke_finish_bundle()[source]

Invokes the DoFn.finish_bundle() method.

invoke_process(windowed_value)[source]

Invokes the DoFn.process() function.

Parameters:windowed_value – a WindowedValue object that gives the element for which process() method should be invoked along with the window the element belongs to.
invoke_start_bundle()[source]

Invokes the DoFn.start_bundle() method.

class apache_beam.runners.common.DoFnMethodWrapper(do_fn, method_name)[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

Represents a method of a DoFn object.

class apache_beam.runners.common.DoFnRunner(fn, args, kwargs, side_inputs, windowing, context=None, tagged_receivers=None, logger=None, step_name=None, logging_context=None, state=None, scoped_metrics_container=None)[source]

Bases: apache_beam.runners.common.Receiver

For internal use only; no backwards-compatibility guarantees.

A helper class for executing ParDo operations.

finish()[source]
process(windowed_value)[source]
receive(windowed_value)[source]
start()[source]
class apache_beam.runners.common.DoFnSignature(do_fn)[source]

Bases: object

Represents the signature of a given DoFn object.

Signature of a DoFn provides a view of the properties of a given DoFn. Among other things, this will give an extensible way for for (1) accessing the structure of the DoFn including methods and method parameters (2) identifying features that a given DoFn support, for example, whether a given DoFn is a Splittable DoFn ( https://s.apache.org/splittable-do-fn) (3) validating a DoFn based on the feature set offered by it.

class apache_beam.runners.common.DoFnState(counter_factory)[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

Keeps track of state that DoFns want, currently, user counters.

counter_for(aggregator)[source]

Looks up the counter for this aggregator, creating one if necessary.

class apache_beam.runners.common.LoggingContext[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

enter()[source]
exit()[source]
class apache_beam.runners.common.PerWindowInvoker(output_processor, signature, context, side_inputs, input_args, input_kwargs)[source]

Bases: apache_beam.runners.common.DoFnInvoker

An invoker that processes elements considering windowing information.

invoke_process(windowed_value)[source]
class apache_beam.runners.common.Receiver[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

An object that consumes a WindowedValue.

This class can be efficiently used to pass values between the sdk and worker harnesses.

receive(windowed_value)[source]
class apache_beam.runners.common.SimpleInvoker(output_processor, signature)[source]

Bases: apache_beam.runners.common.DoFnInvoker

An invoker that processes elements ignoring windowing information.

invoke_process(windowed_value)[source]
apache_beam.runners.common.as_receiver(maybe_receiver)[source]

For internal use only; no backwards-compatibility guarantees.

apache_beam.runners.common.get_logging_context(maybe_logger, **kwargs)[source]

apache_beam.runners.pipeline_context module

Utility class for serializing pipelines via the runner API.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.runners.pipeline_context.PipelineContext(context_proto=None)[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

Used for accessing and constructing the referenced objects of a Pipeline.

static from_runner_api(proto)[source]
to_runner_api()[source]

apache_beam.runners.runner module

PipelineRunner, an abstract base runner object.

class apache_beam.runners.runner.PipelineRunner[source]

Bases: object

A runner of a pipeline object.

The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.

A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.

apply(transform, input)[source]

Runner callback for a pipeline.apply call.

Parameters:
  • transform – the transform to apply.
  • input – transform’s input (typically a PCollection).

A concrete implementation of the Runner class may want to do custom pipeline construction for a given transform. To override the behavior for a transform class Xyz, implement an apply_Xyz method with this same signature.

apply_PTransform(transform, input)[source]
run(pipeline)[source]

Execute the entire pipeline or the sub-DAG reachable from a node.

run_transform(transform_node)[source]

Runner callback for a pipeline.run call.

Parameters:transform_node – transform node for the transform to run.

A concrete implementation of the Runner class must implement run_Abc for some class Abc in the method resolution order for every non-composite transform Xyz in the pipeline.

class apache_beam.runners.runner.PipelineState[source]

Bases: object

State of the Pipeline, as returned by PipelineResult.state.

This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.

CANCELLED = 'CANCELLED'
DONE = 'DONE'
DRAINED = 'DRAINED'
DRAINING = 'DRAINING'
FAILED = 'FAILED'
RUNNING = 'RUNNING'
STOPPED = 'STOPPED'
UNKNOWN = 'UNKNOWN'
UPDATED = 'UPDATED'
class apache_beam.runners.runner.PipelineResult(state)[source]

Bases: object

A PipelineResult provides access to info about a pipeline.

aggregated_values(aggregator_or_name)[source]

Return a dict of step names to values of the Aggregator.

cancel()[source]

Cancels the pipeline execution.

Raises:
  • IOError – If there is a persistent problem getting job information.
  • NotImplementedError – If the runner does not support this operation.
Returns:

The final state of the pipeline.

metrics()[source]

Returns MetricsResult object to query metrics from the runner.

Raises:NotImplementedError – If the runner does not support this operation.
state

Return the current state of the pipeline execution.

wait_until_finish(duration=None)[source]

Waits until the pipeline finishes and returns the final status.

Parameters:

duration – The time to wait (in milliseconds) for job to finish. If it is set to None, it will wait indefinitely until the job is finished.

Raises:
  • IOError – If there is a persistent problem getting job information.
  • NotImplementedError – If the runner does not support this operation.
Returns:

The final state of the pipeline, or None on timeout.

Module contents

Runner objects execute a Pipeline.

This package defines runners, which are used to execute a pipeline.