Batches the input into desired batch size.
In the following example, we create a pipeline with a
PCollection of produce by season.
GroupIntoBatches to get fixed-sized batches for every key, which outputs a list of elements for every key.
import apache_beam as beam with beam.Pipeline() as pipeline: batches_with_keys = ( pipeline | 'Create produce' >> beam.Create([ ('spring', '🍓'), ('spring', '🥕'), ('spring', '🍆'), ('spring', '🍅'), ('summer', '🥕'), ('summer', '🍅'), ('summer', '🌽'), ('fall', '🥕'), ('fall', '🍅'), ('winter', '🍆'), ]) | 'Group into batches' >> beam.GroupIntoBatches(3) | beam.Map(print))
For unkeyed data and dynamic batch sizes, one may want to use BatchElements.
Last updated on 2021/02/05
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!