Distinct

Pydoc Pydoc




Produces a collection containing distinct elements of the input collection.

Examples

In the following example, we create a pipeline with two PCollections of produce.

We use Distinct to get rid of duplicate elements, which outputs a PCollection of all the unique elements.

import apache_beam as beam

with beam.Pipeline() as pipeline:
  unique_elements = (
      pipeline
      | 'Create produce' >> beam.Create([
          '🥕',
          '🥕',
          '🍆',
          '🍅',
          '🍅',
          '🍅',
      ])
      | 'Deduplicate elements' >> beam.Distinct()
      | beam.Map(print))

Output:

🥕
🍆
🍅
Pydoc Pydoc