Count
![]() |
Counts the number of elements within each aggregation.
Examples
In the following example, we create a pipeline with two PCollection
s of produce.
Then, we apply Count
to get the total number of elements in different ways.
Example 1: Counting all elements in a PCollection
We use Count.Globally()
to count all elements in a PCollection
, even if there are duplicate elements.
Output:
Example 2: Counting elements for each key
We use Count.PerKey()
to count the elements for each unique key in a PCollection
of key-values.
import apache_beam as beam
with beam.Pipeline() as pipeline:
total_elements_per_keys = (
pipeline
| 'Create plants' >> beam.Create([
('spring', 'π'),
('spring', 'π₯'),
('summer', 'π₯'),
('fall', 'π₯'),
('spring', 'π'),
('winter', 'π'),
('spring', 'π
'),
('summer', 'π
'),
('fall', 'π
'),
('summer', 'π½'),
])
| 'Count elements per key' >> beam.combiners.Count.PerKey()
| beam.Map(print))
Output:
Example 3: Counting all unique elements
We use Count.PerElement()
to count the only the unique elements in a PCollection
.
Output:
Related transforms
N/A
![]() |
Last updated on 2023/03/20
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!