Combines all elements in a collection.
See more information in the Beam Programming Guide.
In the following examples, we create a pipeline with a
PCollection of produce.
Then, we apply
CombineGlobally in multiple ways to combine all the elements in the
CombineGlobally accepts a function that takes an
iterable of elements as an input, and combines them to return a single element.
Example 1: Combining with a function
We define a function
get_common_items which takes an
iterable of sets as an input, and calculates the intersection (common items) of those sets.
Example 2: Combining with a lambda function
We can also use lambda functions to simplify Example 1.
Example 3: Combining with multiple arguments
You can pass functions with multiple arguments to
They are passed as additional positional arguments or keyword arguments to the function.
In this example, the lambda function takes
exclude as arguments.
Example 4: Combining with a
The more general way to combine elements, and the most flexible, is with a class that inherits from
CombineFn.create_accumulator(): This creates an empty accumulator. For example, an empty accumulator for a sum would be
0, while an empty accumulator for a product (multiplication) would be
CombineFn.add_input(): Called once per element. Takes an accumulator and an input element, combines them and returns the updated accumulator.
CombineFn.merge_accumulators(): Multiple accumulators could be processed in parallel, so this function helps merging them into a single accumulator.
CombineFn.extract_output(): It allows to do additional calculations before extracting a result.
You can use the following combiner transforms:
Last updated on 2023/11/27
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!