Count

Pydoc Pydoc




Counts the number of elements within each aggregation.

Examples

In the following example, we create a pipeline with two PCollections of produce. Then, we apply Count to get the total number of elements in different ways.

Example 1: Counting all elements in a PCollection

We use Count.Globally() to count all elements in a PCollection, even if there are duplicate elements.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  total_elements = (
      pipeline
      | 'Create plants' >> beam.Create(
          ['πŸ“', 'πŸ₯•', 'πŸ₯•', 'πŸ₯•', 'πŸ†', 'πŸ†', 'πŸ…', 'πŸ…', 'πŸ…', '🌽'])
      | 'Count all elements' >> beam.combiners.Count.Globally()
      | beam.Map(print))

Output:

10

Example 2: Counting elements for each key

We use Count.PerKey() to count the elements for each unique key in a PCollection of key-values.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  total_elements_per_keys = (
      pipeline
      | 'Create plants' >> beam.Create([
          ('spring', 'πŸ“'),
          ('spring', 'πŸ₯•'),
          ('summer', 'πŸ₯•'),
          ('fall', 'πŸ₯•'),
          ('spring', 'πŸ†'),
          ('winter', 'πŸ†'),
          ('spring', 'πŸ…'),
          ('summer', 'πŸ…'),
          ('fall', 'πŸ…'),
          ('summer', '🌽'),
      ])
      | 'Count elements per key' >> beam.combiners.Count.PerKey()
      | beam.Map(print))

Output:

('spring', 4)
('summer', 3)
('fall', 2)
('winter', 1)

Example 3: Counting all unique elements

We use Count.PerElement() to count the only the unique elements in a PCollection.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  total_unique_elements = (
      pipeline
      | 'Create produce' >> beam.Create(
          ['πŸ“', 'πŸ₯•', 'πŸ₯•', 'πŸ₯•', 'πŸ†', 'πŸ†', 'πŸ…', 'πŸ…', 'πŸ…', '🌽'])
      | 'Count unique elements' >> beam.combiners.Count.PerElement()
      | beam.Map(print))

Output:

('πŸ“', 1)
('πŸ₯•', 3)
('πŸ†', 2)
('πŸ…', 3)
('🌽', 1)

N/A

Pydoc Pydoc