GroupByKey

Pydoc Pydoc




Takes a keyed collection of elements and produces a collection where each element consists of a key and all values associated with that key.

See more information in the Beam Programming Guide.

Examples

In the following example, we create a pipeline with a PCollection of produce keyed by season.

We use GroupByKey to group all the produce for each season.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  produce_counts = (
      pipeline
      | 'Create produce counts' >> beam.Create([
          ('spring', 'πŸ“'),
          ('spring', 'πŸ₯•'),
          ('spring', 'πŸ†'),
          ('spring', 'πŸ…'),
          ('summer', 'πŸ₯•'),
          ('summer', 'πŸ…'),
          ('summer', '🌽'),
          ('fall', 'πŸ₯•'),
          ('fall', 'πŸ…'),
          ('winter', 'πŸ†'),
      ])
      | 'Group counts per produce' >> beam.GroupByKey()
      | beam.MapTuple(lambda k, vs: (k, sorted(vs)))  # sort and format
      | beam.Map(print))

Output:

('spring', ['πŸ“', 'πŸ₯•', 'πŸ†', 'πŸ…'])
('summer', ['πŸ₯•', 'πŸ…', '🌽'])
('fall', ['πŸ₯•', 'πŸ…'])
('winter', ['πŸ†'])
Pydoc Pydoc