Transforms for taking samples of the elements in a collection, or samples of the values associated with each key in a collection of key-value pairs.
In the following example, we create a pipeline with a
Then, we get a random sample of elements in different ways.
Example 1: Sample elements from a PCollection
Sample.FixedSizeGlobally() to get a fixed-size random sample of elements from the entire
Example 2: Sample elements for each key
Sample.FixedSizePerKey() to get fixed-size random samples for each unique key in a
PCollection of key-values.
import apache_beam as beam with beam.Pipeline() as pipeline: samples_per_key = ( pipeline | 'Create produce' >> beam.Create([ ('spring', '🍓'), ('spring', '🥕'), ('spring', '🍆'), ('spring', '🍅'), ('summer', '🥕'), ('summer', '🍅'), ('summer', '🌽'), ('fall', '🥕'), ('fall', '🍅'), ('winter', '🍆'), ]) | 'Samples per key' >> beam.combiners.Sample.FixedSizePerKey(3) | beam.Map(print))
- Top finds the largest or smallest element.
Last updated on 2021/02/05
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!