GroupIntoBatches
![]() |
Batches the input into desired batch size.
Examples
In the following example, we create a pipeline with a PCollection
of produce by season.
We use GroupIntoBatches
to get fixed-sized batches for every key, which outputs a list of elements for every key.
import apache_beam as beam
with beam.Pipeline() as pipeline:
batches_with_keys = (
pipeline
| 'Create produce' >> beam.Create([
('spring', 'π'),
('spring', 'π₯'),
('spring', 'π'),
('spring', 'π
'),
('summer', 'π₯'),
('summer', 'π
'),
('summer', 'π½'),
('fall', 'π₯'),
('fall', 'π
'),
('winter', 'π'),
])
| 'Group into batches' >> beam.GroupIntoBatches(3)
| beam.Map(print))
Output:
Related transforms
For unkeyed data and dynamic batch sizes, one may want to use BatchElements.
![]() |
Last updated on 2023/05/31
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!