apache_beam.dataframe.transforms module

class apache_beam.dataframe.transforms.DataframeTransform(func, proxy)[source]

Bases: apache_beam.transforms.ptransform.PTransform

A PTransform for applying function that takes and returns dataframes to one or more PCollections.

For example, if pcoll is a PCollection of dataframes, one could write:

pcoll | DataframeTransform(lambda df: df.group_by('key').sum(), proxy=...)

To pass multiple PCollections, pass a tuple of PCollections wich will be passed to the callable as positional arguments, or a dictionary of PCollections, in which case they will be passed as keyword arguments.