CombineGlobally
Pydoc |
Combines all elements in a collection.
See more information in the Beam Programming Guide.
Examples
In the following examples, we create a pipeline with a PCollection
of produce.
Then, we apply CombineGlobally
in multiple ways to combine all the elements in the PCollection
.
CombineGlobally
accepts a function that takes an iterable
of elements as an input, and combines them to return a single element.
Example 1: Combining with a function
We define a function get_common_items
which takes an iterable
of sets as an input, and calculates the intersection (common items) of those sets.
Example 2: Combining with a lambda function
We can also use lambda functions to simplify Example 1.
Example 3: Combining with multiple arguments
You can pass functions with multiple arguments to CombineGlobally
.
They are passed as additional positional arguments or keyword arguments to the function.
In this example, the lambda function takes sets
and exclude
as arguments.
Example 4: Combining with a CombineFn
The more general way to combine elements, and the most flexible, is with a class that inherits from CombineFn
.
CombineFn.create_accumulator()
: This creates an empty accumulator. For example, an empty accumulator for a sum would be0
, while an empty accumulator for a product (multiplication) would be1
.CombineFn.add_input()
: Called once per element. Takes an accumulator and an input element, combines them and returns the updated accumulator.CombineFn.merge_accumulators()
: Multiple accumulators could be processed in parallel, so this function helps merging them into a single accumulator.CombineFn.extract_output()
: It allows to do additional calculations before extracting a result.
Related transforms
You can use the following combiner transforms:
Pydoc |
Last updated on 2025/01/28
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!