apache_beam.dataframe.transforms module¶
-
class
apache_beam.dataframe.transforms.
DataframeTransform
(func, proxy)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
A PTransform for applying function that takes and returns dataframes to one or more PCollections.
For example, if pcoll is a PCollection of dataframes, one could write:
pcoll | DataframeTransform(lambda df: df.group_by('key').sum(), proxy=...)
To pass multiple PCollections, pass a tuple of PCollections wich will be passed to the callable as positional arguments, or a dictionary of PCollections, in which case they will be passed as keyword arguments.