Top

Pydoc Pydoc




Transforms for finding the largest (or smallest) set of elements in a collection, or the largest (or smallest) set of values associated with each key in a collection of key-value pairs.

Examples

In the following example, we create a pipeline with a PCollection. Then, we get the largest or smallest elements in different ways.

Example 1: Largest elements from a PCollection

We use Top.Largest() to get the largest elements from the entire PCollection.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  largest_elements = (
      pipeline
      | 'Create numbers' >> beam.Create([3, 4, 1, 2])
      | 'Largest N values' >> beam.combiners.Top.Largest(2)
      | beam.Map(print))

Output:

[4, 3]

Example 2: Largest elements for each key

We use Top.LargestPerKey() to get the largest elements for each unique key in a PCollection of key-values.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  largest_elements_per_key = (
      pipeline
      | 'Create produce' >> beam.Create([
          ('πŸ₯•', 3),
          ('πŸ₯•', 2),
          ('πŸ†', 1),
          ('πŸ…', 4),
          ('πŸ…', 5),
          ('πŸ…', 3),
      ])
      | 'Largest N values per key' >> beam.combiners.Top.LargestPerKey(2)
      | beam.Map(print))

Output:

('πŸ₯•', [3, 2])
('πŸ†', [1])
('πŸ…', [5, 4])

Example 3: Smallest elements from a PCollection

We use Top.Smallest() to get the smallest elements from the entire PCollection.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  smallest_elements = (
      pipeline
      | 'Create numbers' >> beam.Create([3, 4, 1, 2])
      | 'Smallest N values' >> beam.combiners.Top.Smallest(2)
      | beam.Map(print))

Output:

[1, 2]

Example 4: Smallest elements for each key

We use Top.SmallestPerKey() to get the smallest elements for each unique key in a PCollection of key-values.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  smallest_elements_per_key = (
      pipeline
      | 'Create produce' >> beam.Create([
          ('πŸ₯•', 3),
          ('πŸ₯•', 2),
          ('πŸ†', 1),
          ('πŸ…', 4),
          ('πŸ…', 5),
          ('πŸ…', 3),
      ])
      | 'Smallest N values per key' >> beam.combiners.Top.SmallestPerKey(2)
      | beam.Map(print))

Output:

('πŸ₯•', [2, 3])
('πŸ†', [1])
('πŸ…', [3, 4])

Example 5: Custom elements from a PCollection

We use Top.Of() to get elements with customized rules from the entire PCollection.

You can change how the elements are compared with key. By default you get the largest elements, but you can get the smallest by setting reverse=True.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  shortest_elements = (
      pipeline
      | 'Create produce names' >> beam.Create([
          'πŸ“ Strawberry',
          'πŸ₯• Carrot',
          '🍏 Green apple',
          'πŸ† Eggplant',
          '🌽 Corn',
      ])
      | 'Shortest names' >> beam.combiners.Top.Of(
          2,             # number of elements
          key=len,       # optional, defaults to the element itself
          reverse=True,  # optional, defaults to False (largest/descending)
      )
      | beam.Map(print)
  )

Output:

['🌽 Corn', 'πŸ₯• Carrot']

Example 6: Custom elements for each key

We use Top.PerKey() to get elements with customized rules for each unique key in a PCollection of key-values.

You can change how the elements are compared with key. By default you get the largest elements, but you can get the smallest by setting reverse=True.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  shortest_elements_per_key = (
      pipeline
      | 'Create produce names' >> beam.Create([
          ('spring', 'πŸ₯• Carrot'),
          ('spring', 'πŸ“ Strawberry'),
          ('summer', 'πŸ₯• Carrot'),
          ('summer', '🌽 Corn'),
          ('summer', '🍏 Green apple'),
          ('fall', 'πŸ₯• Carrot'),
          ('fall', '🍏 Green apple'),
          ('winter', 'πŸ† Eggplant'),
      ])
      | 'Shortest names per key' >> beam.combiners.Top.PerKey(
          2,             # number of elements
          key=len,       # optional, defaults to the value itself
          reverse=True,  # optional, defaults to False (largest/descending)
      )
      | beam.Map(print)
  )

Output:

('spring', ['πŸ₯• Carrot', 'πŸ“ Strawberry'])
('summer', ['🌽 Corn', 'πŸ₯• Carrot'])
('fall', ['πŸ₯• Carrot', '🍏 Green apple'])
('winter', ['πŸ† Eggplant'])
Pydoc Pydoc