apache_beam.dataframe.convert module

apache_beam.dataframe.convert.to_dataframe(pcoll, proxy)[source]

Convers a PCollection to a deferred dataframe-like object, which can manipulated with pandas methods like filter and groupby.

For example, one might write:

pcoll = ...
df = to_dataframe(pcoll, proxy=...)
result = df.groupby('col').sum()
pcoll_result = to_pcollection(result)

A proxy object must be given if the schema for the PCollection is not known.

apache_beam.dataframe.convert.to_pcollection(*dataframes, **kwargs)[source]

Converts one or more deferred dataframe-like objects back to a PCollection.

This method creates and applies the actual Beam operations that compute the given deferred dataframes, returning a PCollection of their results.

If more than one (related) result is desired, it can be more efficient to pass them all at the same time to this method.