Distinct
![]() |
Produces a collection containing distinct elements of the input collection.
Examples
In the following example, we create a pipeline with two PCollection
s of produce.
We use Distinct
to get rid of duplicate elements, which outputs a PCollection
of all the unique elements.
import apache_beam as beam
with beam.Pipeline() as pipeline:
unique_elements = (
pipeline
| 'Create produce' >> beam.Create([
'🥕',
'🥕',
'🍆',
'🍅',
'🍅',
'🍅',
])
| 'Deduplicate elements' >> beam.Distinct()
| beam.Map(print))
Output:
🥕
🍆
🍅
![]() |
Related transforms
- Count counts the number of elements within each aggregation.
![]() |