Sample
![]() |
Transforms for taking samples of the elements in a collection, or samples of the values associated with each key in a collection of key-value pairs.
Examples
In the following example, we create a pipeline with a PCollection
.
Then, we get a random sample of elements in different ways.
Example 1: Sample elements from a PCollection
We use Sample.FixedSizeGlobally()
to get a fixed-size random sample of elements from the entire PCollection
.
Output:
Example 2: Sample elements for each key
We use Sample.FixedSizePerKey()
to get fixed-size random samples for each unique key in a PCollection
of key-values.
import apache_beam as beam
with beam.Pipeline() as pipeline:
samples_per_key = (
pipeline
| 'Create produce' >> beam.Create([
('spring', 'π'),
('spring', 'π₯'),
('spring', 'π'),
('spring', 'π
'),
('summer', 'π₯'),
('summer', 'π
'),
('summer', 'π½'),
('fall', 'π₯'),
('fall', 'π
'),
('winter', 'π'),
])
| 'Samples per key' >> beam.combiners.Sample.FixedSizePerKey(3)
| beam.Map(print))
Output:
Related transforms
- Top finds the largest or smallest element.
![]() |
Last updated on 2021/02/05
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!