apache_beam.transforms.window module

Windowing concepts.

A WindowInto transform logically divides up or groups the elements of a PCollection into finite windows according to a windowing function (derived from WindowFn).

The output of WindowInto contains the same elements as input, but they have been logically assigned to windows. The next GroupByKey(s) transforms, including one within a composite transform, will group by the combination of keys and windows.

Windowing a PCollection allows chunks of it to be processed individually, before the entire PCollection is available. This is especially important for PCollection(s) with unbounded size, since the full PCollection is never available at once, since more data is continually arriving. For PCollection(s) with a bounded size (aka. conventional batch mode), by default, all data is implicitly in a single window (see GlobalWindows), unless WindowInto is applied.

For example, a simple form of windowing divides up the data into fixed-width time intervals, using FixedWindows.

Seconds are used as the time unit for the built-in windowing primitives here. Integer or floating point seconds can be passed to these primitives.

Internally, seconds, with microsecond granularity, are stored as timeutil.Timestamp and timeutil.Duration objects. This is done to avoid precision errors that would occur with floating point representations.

Custom windowing function classes can be created, by subclassing from WindowFn.

class apache_beam.transforms.window.TimestampCombiner[source]

Bases: object

Determines how output timestamps of grouping operations are assigned.

OUTPUT_AT_EOW = 1
OUTPUT_AT_EARLIEST = 3
OUTPUT_AT_LATEST = 2
OUTPUT_AT_EARLIEST_TRANSFORMED = 'OUTPUT_AT_EARLIEST_TRANSFORMED'
static get_impl(timestamp_combiner: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7fbf52f85a00>, window_fn: apache_beam.transforms.window.WindowFn) → apache_beam.transforms.timeutil.TimestampCombinerImpl[source]
class apache_beam.transforms.window.WindowFn[source]

Bases: apache_beam.utils.urns.RunnerApiFn

An abstract windowing function defining a basic assign and merge.

class AssignContext(timestamp: Union[int, float, Timestamp], element: Optional[Any] = None, window: Optional[BoundedWindow] = None)[source]

Bases: object

Context passed to WindowFn.assign().

assign(assign_context: AssignContext) → Iterable[BoundedWindow][source]

Associates windows to an element.

Parameters:assign_context – Instance of AssignContext.
Returns:An iterable of BoundedWindow.
class MergeContext(windows: Iterable[BoundedWindow])[source]

Bases: object

Context passed to WindowFn.merge() to perform merging, if any.

merge(to_be_merged: Iterable[BoundedWindow], merge_result: apache_beam.transforms.window.BoundedWindow) → None[source]
merge(merge_context: apache_beam.transforms.window.WindowFn.MergeContext) → None[source]

Returns a window that is the result of merging a set of windows.

is_merging() → bool[source]

Returns whether this WindowFn merges windows.

get_window_coder() → apache_beam.coders.coders.Coder[source]
get_transformed_output_time(window: apache_beam.transforms.window.BoundedWindow, input_timestamp: apache_beam.utils.timestamp.Timestamp) → apache_beam.utils.timestamp.Timestamp[source]

Given input time and output window, returns output time for window.

If TimestampCombiner.OUTPUT_AT_EARLIEST_TRANSFORMED is used in the Windowing, the output timestamp for the given window will be the earliest of the timestamps returned by get_transformed_output_time() for elements of the window.

Parameters:
  • window – Output window of element.
  • input_timestamp – Input timestamp of element as a timeutil.Timestamp object.
Returns:

Transformed timestamp.

to_runner_api_parameter(context)
class apache_beam.transforms.window.BoundedWindow(end: Union[int, float, Timestamp])[source]

Bases: object

A window for timestamps in range (-infinity, end).

end

End of window.

start
end
max_timestamp() → apache_beam.utils.timestamp.Timestamp[source]
class apache_beam.transforms.window.IntervalWindow[source]

Bases: apache_beam.utils.windowed_value._IntervalWindowBase, apache_beam.transforms.window.BoundedWindow

A window for timestamps in range [start, end).

start

Start of window as seconds since Unix epoch.

end

End of window as seconds since Unix epoch.

intersects(other: apache_beam.transforms.window.IntervalWindow) → bool[source]
union(other: apache_beam.transforms.window.IntervalWindow) → apache_beam.transforms.window.IntervalWindow[source]
class apache_beam.transforms.window.TimestampedValue(value: V, timestamp: Union[int, float, Timestamp])[source]

Bases: typing.Generic

A timestamped value having a value and a timestamp.

value

The underlying value.

timestamp

Timestamp associated with the value as seconds since Unix epoch.

class apache_beam.transforms.window.GlobalWindow[source]

Bases: apache_beam.transforms.window.BoundedWindow

The default window into which all data is placed (via GlobalWindows).

start
class apache_beam.transforms.window.NonMergingWindowFn[source]

Bases: apache_beam.transforms.window.WindowFn

is_merging() → bool[source]
merge(merge_context: apache_beam.transforms.window.WindowFn.MergeContext) → None[source]
class apache_beam.transforms.window.GlobalWindows[source]

Bases: apache_beam.transforms.window.NonMergingWindowFn

A windowing function that assigns everything to one global window.

classmethod windowed_batch(batch: Any, timestamp: apache_beam.utils.timestamp.Timestamp = Timestamp(-9223372036854.775000), pane_info: apache_beam.utils.windowed_value.PaneInfo = PaneInfo(first: True, last: True, timing: UNKNOWN, index: 0, nonspeculative_index: 0)) → apache_beam.utils.windowed_value.WindowedBatch[source]
classmethod windowed_value(value: Any, timestamp: apache_beam.utils.timestamp.Timestamp = Timestamp(-9223372036854.775000), pane_info: apache_beam.utils.windowed_value.PaneInfo = PaneInfo(first: True, last: True, timing: UNKNOWN, index: 0, nonspeculative_index: 0)) → apache_beam.utils.windowed_value.WindowedValue[source]
classmethod windowed_value_at_end_of_window(value)[source]
assign(assign_context: apache_beam.transforms.window.WindowFn.AssignContext) → List[apache_beam.transforms.window.GlobalWindow][source]
get_window_coder() → apache_beam.coders.coders.GlobalWindowCoder[source]
to_runner_api_parameter(context)[source]
static from_runner_api_parameter(unused_fn_parameter, unused_context) → apache_beam.transforms.window.GlobalWindows[source]
class apache_beam.transforms.window.FixedWindows(size: Union[int, float, Duration], offset: Union[int, float, Timestamp] = 0)[source]

Bases: apache_beam.transforms.window.NonMergingWindowFn

A windowing function that assigns each element to one time interval.

The attributes size and offset determine in what time interval a timestamp will be slotted. The time intervals have the following formula: [N * size + offset, (N + 1) * size + offset)

size

Size of the window as seconds.

offset

Offset of this window as seconds. Windows start at t=N * size + offset where t=0 is the UNIX epoch. The offset must be a value in range [0, size). If it is not it will be normalized to this range.

Initialize a FixedWindows function for a given size and offset.

Parameters:
  • size (int) – Size of the window in seconds.
  • offset (int) – Offset of this window as seconds. Windows start at t=N * size + offset where t=0 is the UNIX epoch. The offset must be a value in range [0, size). If it is not it will be normalized to this range.
assign(context: apache_beam.transforms.window.WindowFn.AssignContext) → List[apache_beam.transforms.window.IntervalWindow][source]
get_window_coder() → apache_beam.coders.coders.IntervalWindowCoder[source]
to_runner_api_parameter(context)[source]
static from_runner_api_parameter(fn_parameter, unused_context) → apache_beam.transforms.window.FixedWindows[source]
class apache_beam.transforms.window.SlidingWindows(size: Union[int, float, Duration], period: Union[int, float, Duration], offset: Union[int, float, Timestamp] = 0)[source]

Bases: apache_beam.transforms.window.NonMergingWindowFn

A windowing function that assigns each element to a set of sliding windows.

The attributes size and offset determine in what time interval a timestamp will be slotted. The time intervals have the following formula: [N * period + offset, N * period + offset + size)

size

Size of the window as seconds.

period

Period of the windows as seconds.

offset

Offset of this window as seconds since Unix epoch. Windows start at t=N * period + offset where t=0 is the epoch. The offset must be a value in range [0, period). If it is not it will be normalized to this range.

assign(context: apache_beam.transforms.window.WindowFn.AssignContext) → List[apache_beam.transforms.window.IntervalWindow][source]
get_window_coder() → apache_beam.coders.coders.IntervalWindowCoder[source]
to_runner_api_parameter(context)[source]
static from_runner_api_parameter(fn_parameter, unused_context) → apache_beam.transforms.window.SlidingWindows[source]
class apache_beam.transforms.window.Sessions(gap_size: Union[int, float, Duration])[source]

Bases: apache_beam.transforms.window.WindowFn

A windowing function that groups elements into sessions.

A session is defined as a series of consecutive events separated by a specified gap size.

gap_size

Size of the gap between windows as floating-point seconds.

assign(context: apache_beam.transforms.window.WindowFn.AssignContext) → List[apache_beam.transforms.window.IntervalWindow][source]
get_window_coder() → apache_beam.coders.coders.IntervalWindowCoder[source]
merge(merge_context: apache_beam.transforms.window.WindowFn.MergeContext) → None[source]
to_runner_api_parameter(context)[source]
static from_runner_api_parameter(fn_parameter, unused_context) → apache_beam.transforms.window.Sessions[source]