Python transform catalog overview

Element-wise

TransformDescription
EnrichmentPerforms data enrichment with a remote service.
FilterGiven a predicate, filter out all elements that don't satisfy the predicate.
FlatMapApplies a function that returns a collection to every element in the input and outputs all resulting elements.
KeysExtracts the key from each element in a collection of key-value pairs.
KvSwapSwaps the key and value of each element in a collection of key-value pairs.
MapApplies a function to every element in the input and outputs the result.
MLTransformApplies data processing transforms to the dataset.
ParDoThe most-general mechanism for applying a user-defined DoFn to every element in the input collection.
PartitionRoutes each input element to a specific output collection based on some partition function.
RegexFilters input string elements based on a regex. May also transform them based on the matching groups.
ReifyTransforms for converting between explicit and implicit form of various Beam values.
RunInferenceUses machine learning (ML) models to do local and remote inference.
ToStringTransforms every element in an input collection a string.
WithTimestampsApplies a function to determine a timestamp to each element in the output collection, and updates the implicit timestamp associated with each input. Note that it is only safe to adjust timestamps forwards.
ValuesExtracts the value from each element in a collection of key-value pairs.

Aggregation

TransformDescription
ApproximateQuantilesGiven a distribution, find the approximate N-tiles.
ApproximateUniqueGiven a pcollection, return the estimated number of unique elements.
BatchElementsGiven a pcollection, return the estimated number of unique elements.
CoGroupByKeyTakes several keyed collections of elements and produces a collection where each element consists of a key and all values associated with that key.
CombineGloballyTransforms to combine elements.
CombinePerKeyTransforms to combine elements for each key.
CombineValuesTransforms to combine keyed iterables.
CountCounts the number of elements within each aggregation.
DistinctProduces a collection containing distinct elements from the input collection.
GroupByKeyTakes a keyed collection of elements and produces a collection where each element consists of a key and all values associated with that key.
GroupByTakes a collection of elements and produces a collection grouped, by properties of those elements. Unlike GroupByKey, the key is dynamically created from the elements themselves.
GroupIntoBatchesBatches the input into desired batch size.
LatestGets the element with the latest timestamp.
MaxGets the element with the maximum value within each aggregation.
MeanComputes the average within each aggregation.
MinGets the element with the minimum value within each aggregation.
SampleRandomly select some number of elements from each aggregation.
SumSums all the elements within each aggregation.
ToListAggregates all elements into a single list.
TopCompute the largest element(s) in each aggregation.

Other

TransformDescription
CreateCreates a collection from an in-memory list.
FlattenGiven multiple input collections, produces a single output collection containing all elements from all of the input collections.
ReshuffleGiven an input collection, redistributes the elements between workers. This is most useful for adjusting parallelism or preventing coupled failures.
WindowIntoLogically divides up or groups the elements of a collection into finite windows according to a function.