apache_beam.runners.sdf_utils module

Common utility class to help SDK harness to execute an SDF.

class apache_beam.runners.sdf_utils.SplitResultPrimary(primary_value)

Bases: tuple

Create new instance of SplitResultPrimary(primary_value,)

primary_value

Alias for field number 0

class apache_beam.runners.sdf_utils.SplitResultResidual(residual_value, current_watermark, deferred_timestamp)

Bases: tuple

Create new instance of SplitResultResidual(residual_value, current_watermark, deferred_timestamp)

current_watermark

Alias for field number 1

deferred_timestamp

Alias for field number 2

residual_value

Alias for field number 0

class apache_beam.runners.sdf_utils.ThreadsafeRestrictionTracker(restriction_tracker: RestrictionTracker)[source]

Bases: object

A thread-safe wrapper which wraps a RestrictionTracker.

This wrapper guarantees synchronization of modifying restrictions across multi-thread.

current_restriction()[source]
try_claim(position)[source]
defer_remainder(deferred_time=None)[source]

Performs self-checkpoint on current processing restriction with an expected resuming time.

Self-checkpoint could happen during processing elements. When executing an DoFn.process(), you may want to stop processing an element and resuming later if current element has been processed quit a long time or you also want to have some outputs from other elements. defer_remainder() can be called on per element if needed.

Parameters:deferred_time – A relative Duration that indicates the ideal time gap between now and resuming, or an absolute Timestamp for resuming execution time. If the time_delay is None, the deferred work will be executed as soon as possible.
check_done()[source]
current_progress() → RestrictionProgress[source]
try_split(fraction_of_remainder)[source]
deferred_status() → Optional[Tuple[Any, apache_beam.utils.timestamp.Duration]][source]

Returns deferred work which is produced by defer_remainder().

When there is a self-checkpoint performed, the system needs to fulfill the DelayedBundleApplication with deferred_work for a ProcessBundleResponse. The system calls this API to get deferred_residual with watermark together to help the runner to schedule a future work.

Returns: (deferred_residual, time_delay) if having any residual, else None.

is_bounded()[source]
class apache_beam.runners.sdf_utils.RestrictionTrackerView(threadsafe_restriction_tracker: apache_beam.runners.sdf_utils.ThreadsafeRestrictionTracker)[source]

Bases: object

A DoFn view of thread-safe RestrictionTracker.

The RestrictionTrackerView wraps a ThreadsafeRestrictionTracker and only exposes APIs that will be called by a DoFn.process(). During execution time, the RestrictionTrackerView will be fed into the DoFn.process as a restriction_tracker.

current_restriction()[source]
try_claim(position)[source]
defer_remainder(deferred_time=None)[source]
is_bounded()[source]
class apache_beam.runners.sdf_utils.ThreadsafeWatermarkEstimator(watermark_estimator: WatermarkEstimator)[source]

Bases: object

A threadsafe wrapper which wraps a WatermarkEstimator with locking mechanism to guarantee multi-thread safety.

get_estimator_state()[source]
current_watermark() → apache_beam.utils.timestamp.Timestamp[source]
observe_timestamp(timestamp: apache_beam.utils.timestamp.Timestamp) → None[source]
class apache_beam.runners.sdf_utils.NoOpWatermarkEstimatorProvider[source]

Bases: apache_beam.transforms.core.WatermarkEstimatorProvider

A WatermarkEstimatorProvider which creates NoOpWatermarkEstimator for the framework.

initial_estimator_state(element, restriction)[source]
create_watermark_estimator(estimator_state)[source]