GroupByKey
![]() |
Takes a keyed collection of elements and produces a collection where each element consists of a key and all values associated with that key.
See more information in the Beam Programming Guide.
Examples
In the following example, we create a pipeline with a PCollection
of produce keyed by season.
We use GroupByKey
to group all the produce for each season.
import apache_beam as beam
with beam.Pipeline() as pipeline:
produce_counts = (
pipeline
| 'Create produce counts' >> beam.Create([
('spring', 'π'),
('spring', 'π₯'),
('spring', 'π'),
('spring', 'π
'),
('summer', 'π₯'),
('summer', 'π
'),
('summer', 'π½'),
('fall', 'π₯'),
('fall', 'π
'),
('winter', 'π'),
])
| 'Group counts per produce' >> beam.GroupByKey()
| beam.MapTuple(lambda k, vs: (k, sorted(vs))) # sort and format
| beam.Map(print))
Output:
Related transforms
- GroupBy for grouping by arbitrary properties of the elements.
- CombinePerKey for combining all values associated with a key to a single result.
- CoGroupByKey for multiple input collections.
![]() |
Last updated on 2023/05/31
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!