Produces a collection containing distinct elements of the input collection.
In the following example, we create a pipeline with two
PCollections of produce.
Distinct to get rid of duplicate elements, which outputs a
PCollection of all the unique elements.
import apache_beam as beam with beam.Pipeline() as pipeline: unique_elements = ( pipeline | 'Create produce' >> beam.Create([ '🥕', '🥕', '🍆', '🍅', '🍅', '🍅', ]) | 'Deduplicate elements' >> beam.Distinct() | beam.Map(print))
🥕 🍆 🍅
|View source code|
- Count counts the number of elements within each aggregation.