apache_beam.transforms.periodicsequence module

class apache_beam.transforms.periodicsequence.ImpulseSeqGenRestrictionProvider[source]

Bases: RestrictionProvider

initial_restriction(element)[source]
create_tracker(restriction)[source]
restriction_size(element, restriction)[source]
truncate(unused_element, unused_restriction)[source]
class apache_beam.transforms.periodicsequence.ImpulseSeqGenDoFn(data: Sequence[Any] | None = None)[source]

Bases: DoFn

ImpulseSeqGenDoFn fn receives tuple elements with three parts:

  • first_timestamp = The timestamp of the first element to be generated (inclusive).

  • last_timestamp = The timestamp marking the end of the generation period (exclusive). No elements will be generated at or after this time.

  • fire_interval = how often to fire an element.

For each input element received, ImpulseSeqGenDoFn fn will start generating output elements in following pattern:

  • if element timestamp is less than current runtime then output element.

  • if element timestamp is greater than current runtime, wait until next element timestamp.

ImpulseSeqGenDoFn can’t guarantee that each element is output at exact time. ImpulseSeqGenDoFn guarantees that elements would not be output prior to given runtime timestamp.

The output mode of the DoFn is based on the input data:

  • None: If data is None (by default), the output element will be the timestamp.

  • Non-Timestamped Data: If data is a sequence of arbitrary values (e.g., [v1, v2, …]), the DoFn will assign a timestamp to each emitted element.

  • Pre-Timestamped Data: If data is a sequence of tuples, where each tuple is (apache_beam.utils.timestamp.Timestamp, value), the DoFn will use the provided timestamp for the emitted element.

See the parameter description of PeriodicImpulse for more information.

process(element, restriction_tracker=RestrictionParam(ImpulseSeqGenRestrictionProvider), watermark_estimator=WatermarkEstimatorProvider)[source]
Parameters:
  • element – (start_timestamp, end_timestamp, interval)

  • restriction_tracker

Returns:

yields elements at processing real-time intervals with value of target output timestamp for the element.

class apache_beam.transforms.periodicsequence.PeriodicSequence[source]

Bases: PTransform

PeriodicSequence transform receives tuple elements with three parts:

  • first_timestamp = first timestamp to output element for.

  • last_timestamp = last timestamp/time to output element for.

  • fire_interval = how often to fire an element.

For each input element received, PeriodicSequence transform will start generating output elements in following pattern:

  • if element timestamp is less than current runtime then output element.

  • if element timestamp is greater than current runtime, wait until next element timestamp.

PeriodicSequence can’t guarantee that each element is output at exact time. PeriodicSequence guarantees that elements would not be output prior to given runtime timestamp. The PCollection generated by PeriodicSequence is unbounded.

expand(pcoll)[source]
class apache_beam.transforms.periodicsequence.RebaseMode(value)[source]

Bases: Enum

Controls how the start and stop timestamps are rebased to execution time.

REBASE_NONE

Timestamps are not changed.

REBASE_ALL

Both start and stop timestamps are rebased, preserving the original duration.

REBASE_START

Only the start timestamp is rebased; the stop timestamp is unchanged.

REBASE_NONE = 0
REBASE_ALL = 1
REBASE_START = 2
class apache_beam.transforms.periodicsequence.PeriodicImpulse(start_timestamp: int | float | Timestamp = Timestamp(1754512363.586116), stop_timestamp: int | float | Timestamp = Timestamp(9223372036854.775000), fire_interval: float = 360.0, apply_windowing: bool = False, data: Sequence[Any] | None = None, rebase: RebaseMode = RebaseMode.REBASE_NONE)[source]

Bases: PTransform

PeriodicImpulse transform generates an infinite sequence of elements with given runtime interval.

PeriodicImpulse transform behaves same as {@link PeriodicSequence} transform, but can be used as first transform in pipeline. The PCollection generated by PeriodicImpulse is unbounded.

Parameters:
  • start_timestamp – Timestamp for first element.

  • stop_timestamp – Timestamp at or after which no elements will be output.

  • fire_interval – Interval in seconds at which to output elements.

  • apply_windowing – Whether each element should be assigned to individual window. If false, all elements will reside in global window.

  • data

    A sequence of elements to emit. The behavior depends on the content:

    • None (default): The transform emits the event timestamps as the element values, starting from start_timestamp and incrementing by fire_interval up to the stop_timestamp (exclusive)

    • Sequence of raw values (e.g., `[‘a’, ‘b’]`): The transform emits each value in the sequence, assigning it an event timestamp that is calculated in the same manner as the default scenario. The sequence is repeated if the impulse duration requires more elements than are in the sequence (a warning will be given in this case).

    • Sequence of pre-timestamped tuples (e.g., `[(t1, v1), (t2, v2)]`): The transform emits each value with its explicitly provided event time. The format must be (apache_beam.utils.timestamp.Timestamp, value). The provided timestamps are used directly, overriding the calculated ones. Note that the elements in the sequence is NOT required to be ordered by event time; an element with a timestamp earlier than a preceding one will be treated as a potential late event. Important: In this mode, the number of elements in data must be sufficient to cover the duration defined by start_timestamp, stop_timestamp, and fire_interval; otherwise, a ValueError is raised.

  • rebase – Controls how the start and stop timestamps are rebased to execution time. See RebaseMode for more details. Defaults to REBASE_NONE.

expand(pbegin)[source]