This transform is used to perform aggregations over groups of elements.
It receives a CombineFn, which defines functions to create an intermediate
aggregator, add elements to it, and transform the aggregator into the expected
output.
Combines are a valuable transform because they allow for optimizations that
can reduce the amount of data being exchanged between workers
(a.k.a. "shuffled"). They do this by performing partial aggregations
before a GroupByKey and after the GroupByKey. The partial aggregations
help reduce the original data into a single aggregator per key per worker.
This transform is used to perform aggregations over groups of elements.
It receives a
CombineFn
, which defines functions to create an intermediate aggregator, add elements to it, and transform the aggregator into the expected output.Combines are a valuable transform because they allow for optimizations that can reduce the amount of data being exchanged between workers (a.k.a. "shuffled"). They do this by performing partial aggregations before a
GroupByKey
and after theGroupByKey
. The partial aggregations help reduce the original data into a single aggregator per key per worker.