apache_beam.runners.direct.sdf_direct_runner module

This module contains Splittable DoFn logic that is specific to DirectRunner.

class apache_beam.runners.direct.sdf_direct_runner.ProcessKeyedElementsViaKeyedWorkItemsOverride[source]

Bases: apache_beam.pipeline.PTransformOverride

A transform override for ProcessElements transform.

matches(applied_ptransform)[source]
get_replacement_transform(ptransform)[source]
class apache_beam.runners.direct.sdf_direct_runner.ProcessKeyedElementsViaKeyedWorkItems(process_keyed_elements_transform)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A transform that processes Splittable DoFn input via KeyedWorkItems.

expand(pcoll)[source]
class apache_beam.runners.direct.sdf_direct_runner.ProcessElements(process_keyed_elements_transform)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A primitive transform for processing keyed elements or KeyedWorkItems.

Will be evaluated by runners.direct.transform_evaluator._ProcessElementsEvaluator.

expand(pcoll)[source]
new_process_fn(sdf)[source]
class apache_beam.runners.direct.sdf_direct_runner.ProcessFn(sdf, args_for_invoker, kwargs_for_invoker)[source]

Bases: apache_beam.transforms.core.DoFn

A DoFn that executes machineary for invoking a Splittable DoFn.

Input to the ParDo step that includes a ProcessFn will be a PCollection of ElementAndRestriction objects.

This class is mainly responsible for following. (1) setup environment for properly invoking a Splittable DoFn. (2) invoke process() method of a Splittable DoFn. (3) after the process() invocation of the Splittable DoFn, determine if a re-invocation of the element is needed. If this is the case, set state and a timer for a re-invocation and hold output watermark till this re-invocation. (4) after the final invocation of a given element clear any previous state set for re-invoking the element and release the output watermark.

step_context
set_process_element_invoker(process_element_invoker)[source]
process(element, timestamp='TimestampParam', window='WindowParam', *args, **kwargs)[source]
class apache_beam.runners.direct.sdf_direct_runner.SDFProcessElementInvoker(max_num_outputs, max_duration)[source]

Bases: object

A utility that invokes SDF process() method and requests checkpoints.

This class is responsible for invoking the process() method of a Splittable DoFn and making sure that invocation terminated properly. Based on the input configuration, this class may decide to request a checkpoint for a process() execution so that runner can process current output and resume the invocation at a later time.

More specifically, when initializing a SDFProcessElementInvoker, caller may specify the number of output elements or processing time after which a checkpoint should be requested. This class is responsible for properly requesting a checkpoint based on either of these criteria. When the process() call of Splittable DoFn ends, this class performs validations to make sure that processing ended gracefully and returns a SDFProcessElementInvoker.Result that contains information which can be used by the caller to perform another process() invocation for the residual.

A process() invocation may decide to give up processing voluntarily by returning a ProcessContinuation object (see documentation of ProcessContinuation for more details). So if a ‘ProcessContinuation’ is produced this class ends the execution and performs steps to finalize the current invocation.

class Result(residual_restriction=None, process_continuation=None, future_output_watermark=None)[source]

Bases: object

Returned as a result of a invoke_process_element() invocation.

Parameters:
  • residual_restriction – a restriction for the unprocessed part of the element.
  • process_continuation – a ProcessContinuation if one was returned as the last element of the SDF process() invocation.
  • future_output_watermark – output watermark of the results that will be produced when invoking the Splittable DoFn for the current element with residual_restriction.
test_method()[source]
invoke_process_element(sdf_invoker, element, tracker, *args, **kwargs)[source]

Invokes process() method of a Splittable DoFn for a given element.

Parameters:
  • sdf_invoker – a DoFnInvoker for the Splittable DoFn.
  • element – the element to process
  • tracker – a RestrictionTracker for the element that will be passed when invoking the process() method of the Splittable DoFn.
Returns:

a SDFProcessElementInvoker.Result object.