Applies a simple 1-to-1 mapping function over each element in the collection.
In the following examples, we create a pipeline with a
PCollection of produce with their icon, name, and duration.
Then, we apply
Map in multiple ways to transform every element in the
Map accepts a function that returns a single element for every input element in the
Example 1: Map with a predefined function
We use the function
str.strip which takes a single
str element and outputs a
It strips the input element’s whitespaces, including newlines and tabs.
Example 2: Map with a function
We define a function
strip_header_and_newline which strips any
' ', and
'\n' characters from each element.
Example 3: Map with a lambda function
We can also use lambda functions to simplify Example 2.
Example 4: Map with multiple arguments
You can pass functions with multiple arguments to
They are passed as additional positional arguments or keyword arguments to the function.
In this example,
chars as arguments.
Example 5: MapTuple for key-value pairs
PCollection consists of
(key, value) pairs,
you can use
MapTuple to unpack them into different function arguments.
Example 6: Map with side inputs as singletons
PCollection has a single value, such as the average from another computation,
PCollection as a singleton accesses that value.
In this example, we pass a
PCollection the value
'# \n' as a singleton.
We then use that value as the characters for the
Example 7: Map with side inputs as iterators
PCollection has multiple values, pass the
PCollection as an iterator.
This accesses elements lazily as they are needed,
so it is possible to iterate over large
PCollections that won’t fit into memory.
Note: You can pass the
PCollectionas a list with
beam.pvalue.AsList(pcollection), but this requires that all the elements fit into memory.
Example 8: Map with side inputs as dictionaries
PCollection is small enough to fit into memory, then that
PCollection can be passed as a dictionary.
Each element must be a
(key, value) pair.
Note that all the elements of the
PCollection must fit into memory for this.
PCollection won’t fit into memory, use
- FlatMap behaves the same as
Map, but for each input it may produce zero or more outputs.
- Filter is useful if the function is just deciding whether to output an element or not.
- ParDo is the most general elementwise mapping operation, and includes other abilities such as multiple output collections and side-inputs.
Last updated on 2024/02/21
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!