Beam SQL: Windowing and triggering
You can use Beam’s windowing semantics in two ways:
- you can configure windowing on your input
PCollectionsbefore passing them to a
- you can use windowing extensions in your windowing query, which will override
the windowing of your input
Triggering can only be used by setting it on your input
are no SQL extensions for specifying triggering.
This section covers the use of SQL extensions to directly apply windowing.
Beam SQL supports windowing functions specified in
GROUP BY clause.
TIMESTAMP field is required in this case. It is used as event timestamp for
Supported windowing functions:
TUMBLE, or fixed windows. Example of how define a fixed window with duration of 1 hour:
SELECT f_int, COUNT(*) FROM PCOLLECTION GROUP BY f_int, TUMBLE(f_timestamp, INTERVAL '1' HOUR)
HOP, or sliding windows. Example of how to define a sliding windows for every 30 minutes with 1 hour duration:
SELECT f_int, COUNT(*) FROM PCOLLECTION GROUP BY f_int, HOP(f_timestamp, INTERVAL '30' MINUTE, INTERVAL '1' HOUR)
SESSION, session windows. Example of how to define a session window with 5 minutes gap duration:
SELECT f_int, COUNT(*) FROM PCOLLECTION GROUP BY f_int, SESSION(f_timestamp, INTERVAL '5' MINUTE)
Note: when no windowing function is specified in the query, then windowing strategy of the input
PCollections is unchanged by the SQL query. If windowing function is specified in the query, then the windowing function of the
PCollection is updated accordingly, but trigger stays unchanged.