Batches the input into desired batch size.
In the following example, we create a pipeline with a
PCollection of produce by season.
GroupIntoBatches to get fixed-sized batches for every key, which outputs a list of elements for every key.
import apache_beam as beam with beam.Pipeline() as pipeline: batches_with_keys = ( pipeline | 'Create produce' >> beam.Create([ ('spring', '🍓'), ('spring', '🥕'), ('spring', '🍆'), ('spring', '🍅'), ('summer', '🥕'), ('summer', '🍅'), ('summer', '🌽'), ('fall', '🥕'), ('fall', '🍅'), ('winter', '🍆'), ]) | 'Group into batches' >> beam.GroupIntoBatches(3) | beam.Map(print))
('spring', ['🍓', '🥕', '🍆']) ('summer', ['🥕', '🍅', '🌽']) ('spring', ['🍅']) ('fall', ['🥕', '🍅']) ('winter', ['🍆'])
|View source code|
For unkeyed data and dynamic batch sizes, one may want to use BatchElements.