apache_beam.runners.direct.sdf_direct_runner module

This module contains Splittable DoFn logic that is specific to DirectRunner.

class apache_beam.runners.direct.sdf_direct_runner.SplittableParDoOverride[source]

Bases: apache_beam.pipeline.PTransformOverride

A transform override for ParDo transformss of SplittableDoFns.

Replaces the ParDo transform with a SplittableParDo transform that performs SDF specific logic.

matches(applied_ptransform)[source]
get_replacement_transform_for_applied_ptransform(applied_ptransform)[source]
class apache_beam.runners.direct.sdf_direct_runner.SplittableParDo(ptransform)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A transform that processes a PCollection using a Splittable DoFn.

expand(pcoll)[source]
class apache_beam.runners.direct.sdf_direct_runner.ElementAndRestriction(element, restriction, watermark_estimator_state)[source]

Bases: object

A holder for an element, restriction, and watermark estimator state.

class apache_beam.runners.direct.sdf_direct_runner.PairWithRestrictionFn(do_fn)[source]

Bases: apache_beam.transforms.core.DoFn

A transform that pairs each element with a restriction.

start_bundle()[source]
process(element, window=WindowParam, *args, **kwargs)[source]
class apache_beam.runners.direct.sdf_direct_runner.SplitRestrictionFn(do_fn)[source]

Bases: apache_beam.transforms.core.DoFn

A transform that perform initial splitting of Splittable DoFn inputs.

start_bundle()[source]
process(element_and_restriction, *args, **kwargs)[source]
class apache_beam.runners.direct.sdf_direct_runner.ExplodeWindowsFn(*unused_args, **unused_kwargs)[source]

Bases: apache_beam.transforms.core.DoFn

A transform that forces the runner to explode windows.

This is done to make sure that Splittable DoFn proceses an element for each of the windows that element belongs to.

process(element, window=WindowParam, *args, **kwargs)[source]
class apache_beam.runners.direct.sdf_direct_runner.RandomUniqueKeyFn(*unused_args, **unused_kwargs)[source]

Bases: apache_beam.transforms.core.DoFn

A transform that assigns a unique key to each element.

process(element, window=WindowParam, *args, **kwargs)[source]
class apache_beam.runners.direct.sdf_direct_runner.ProcessKeyedElements(sdf, element_coder, restriction_coder, windowing_strategy, ptransform_args, ptransform_kwargs, ptransform_side_inputs)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A primitive transform that performs SplittableDoFn magic.

Input to this transform should be a PCollection of keyed ElementAndRestriction objects.

expand(pcoll)[source]
class apache_beam.runners.direct.sdf_direct_runner.ProcessKeyedElementsViaKeyedWorkItemsOverride[source]

Bases: apache_beam.pipeline.PTransformOverride

A transform override for ProcessElements transform.

matches(applied_ptransform)[source]
get_replacement_transform_for_applied_ptransform(applied_ptransform)[source]
class apache_beam.runners.direct.sdf_direct_runner.ProcessKeyedElementsViaKeyedWorkItems(process_keyed_elements_transform)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A transform that processes Splittable DoFn input via KeyedWorkItems.

expand(pcoll)[source]
class apache_beam.runners.direct.sdf_direct_runner.ProcessElements(process_keyed_elements_transform)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A primitive transform for processing keyed elements or KeyedWorkItems.

Will be evaluated by runners.direct.transform_evaluator._ProcessElementsEvaluator.

expand(pcoll)[source]
new_process_fn(sdf)[source]
class apache_beam.runners.direct.sdf_direct_runner.ProcessFn(sdf, args_for_invoker, kwargs_for_invoker)[source]

Bases: apache_beam.transforms.core.DoFn

A DoFn that executes machineary for invoking a Splittable DoFn.

Input to the ParDo step that includes a ProcessFn will be a PCollection of ElementAndRestriction objects.

This class is mainly responsible for following. (1) setup environment for properly invoking a Splittable DoFn. (2) invoke process() method of a Splittable DoFn. (3) after the process() invocation of the Splittable DoFn, determine if a re-invocation of the element is needed. If this is the case, set state and a timer for a re-invocation and hold output watermark till this re-invocation. (4) after the final invocation of a given element clear any previous state set for re-invoking the element and release the output watermark.

step_context
set_process_element_invoker(process_element_invoker)[source]
process(element, timestamp=TimestampParam, window=WindowParam, *args, **kwargs)[source]
class apache_beam.runners.direct.sdf_direct_runner.SDFProcessElementInvoker(max_num_outputs, max_duration)[source]

Bases: object

A utility that invokes SDF process() method and requests checkpoints.

This class is responsible for invoking the process() method of a Splittable DoFn and making sure that invocation terminated properly. Based on the input configuration, this class may decide to request a checkpoint for a process() execution so that runner can process current output and resume the invocation at a later time.

More specifically, when initializing a SDFProcessElementInvoker, caller may specify the number of output elements or processing time after which a checkpoint should be requested. This class is responsible for properly requesting a checkpoint based on either of these criteria. When the process() call of Splittable DoFn ends, this class performs validations to make sure that processing ended gracefully and returns a SDFProcessElementInvoker.Result that contains information which can be used by the caller to perform another process() invocation for the residual.

A process() invocation may decide to give up processing voluntarily by returning a ProcessContinuation object (see documentation of ProcessContinuation for more details). So if a ‘ProcessContinuation’ is produced this class ends the execution and performs steps to finalize the current invocation.

class Result(residual_restriction=None, process_continuation=None, future_output_watermark=None)[source]

Bases: object

Returned as a result of a invoke_process_element() invocation.

Parameters:
  • residual_restriction – a restriction for the unprocessed part of the element.
  • process_continuation – a ProcessContinuation if one was returned as the last element of the SDF process() invocation.
  • future_output_watermark – output watermark of the results that will be produced when invoking the Splittable DoFn for the current element with residual_restriction.
test_method()[source]
invoke_process_element(sdf_invoker, output_processor, element, restriction, watermark_estimator_state, *args, **kwargs)[source]

Invokes process() method of a Splittable DoFn for a given element.

Parameters:
  • sdf_invoker – a DoFnInvoker for the Splittable DoFn.
  • element – the element to process
Returns:

a SDFProcessElementInvoker.Result object.