Counts the number of elements within each aggregation.
In the following example, we create a pipeline with two
PCollections of produce.
Then, we apply
Count to get the total number of elements in different ways.
Example 1: Counting all elements in a PCollection
Count.Globally() to count all elements in a
PCollection, even if there are duplicate elements.
Example 2: Counting elements for each key
Count.PerKey() to count the elements for each unique key in a
PCollection of key-values.
import apache_beam as beam with beam.Pipeline() as pipeline: total_elements_per_keys = ( pipeline | 'Create plants' >> beam.Create([ ('spring', '🍓'), ('spring', '🥕'), ('summer', '🥕'), ('fall', '🥕'), ('spring', '🍆'), ('winter', '🍆'), ('spring', '🍅'), ('summer', '🍅'), ('fall', '🍅'), ('summer', '🌽'), ]) | 'Count elements per key' >> beam.combiners.Count.PerKey() | beam.Map(print))
Example 3: Counting all unique elements
Count.PerElement() to count the only the unique elements in a
Last updated on 2021/02/05
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!