Takes a keyed collection of elements and produces a collection where each element consists of a key and all values associated with that key.
See more information in the Beam Programming Guide.
In the following example, we create a pipeline with a
PCollection of produce keyed by season.
GroupByKey to group all the produce for each season.
import apache_beam as beam with beam.Pipeline() as pipeline: produce_counts = ( pipeline | 'Create produce counts' >> beam.Create([ ('spring', '🍓'), ('spring', '🥕'), ('spring', '🍆'), ('spring', '🍅'), ('summer', '🥕'), ('summer', '🍅'), ('summer', '🌽'), ('fall', '🥕'), ('fall', '🍅'), ('winter', '🍆'), ]) | 'Group counts per produce' >> beam.GroupByKey() | beam.Map(print))
('spring', ['🍓', '🥕', '🍆', '🍅']) ('summer', ['🥕', '🍅', '🌽']) ('fall', ['🥕', '🍅']) ('winter', ['🍆'])
|View source code|
- GroupBy for grouping by arbitrary properties of the elements.
- CombinePerKey for combining all values associated with a key to a single result.
- CoGroupByKey for multiple input collections.