GroupBy
Pydoc |
Takes a collection of elements and produces a collection grouped, by properties of those elements.
Unlike GroupByKey
, the key is dynamically created from the elements themselves.
Grouping Examples
In the following example, we create a pipeline with a PCollection
of fruits.
We use GroupBy
to group all fruits by the first letter of their name.
We can group by a composite key consisting of multiple properties if desired.
The resulting key is a named tuple with the two requested attributes, and the values are grouped accordingly.
In the case that the property one wishes to group by is an attribute, a string
may be passed to GroupBy
in the place of a callable expression.
It is possible to mix and match attributes and expressions, for example
Aggregation
Grouping is often used in conjunction with aggregation, and the
aggregate_field
method of the GroupBy
transform can be used to accomplish
this easily.
This method takes three parameters: the field (or expression) which to
aggregate, the CombineFn
(or associative callable
) with which to aggregate
by, and finally a field name in which to store the result.
For example, suppose one wanted to compute the amount of each fruit to buy.
One could write
Similar to the parameters in GroupBy
, one can also aggregate multiple fields
and by expressions.
One can, of course, aggregate the same field multiple times as well. This example also illustrates a global grouping, as the grouping key is empty.
Related transforms
- CombinePerKey for combining with a single CombineFn.
- GroupByKey for grouping with a known key.
- CoGroupByKey for multiple input collections.
Pydoc |
Last updated on 2025/01/19
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!