apache_beam.dataframe.convert module¶
-
apache_beam.dataframe.convert.
to_dataframe
(pcoll, proxy)[source]¶ Convers a PCollection to a deferred dataframe-like object, which can manipulated with pandas methods like filter and groupby.
For example, one might write:
pcoll = ... df = to_dataframe(pcoll, proxy=...) result = df.groupby('col').sum() pcoll_result = to_pcollection(result)
A proxy object must be given if the schema for the PCollection is not known.
-
apache_beam.dataframe.convert.
to_pcollection
(*dataframes, **kwargs)[source]¶ Converts one or more deferred dataframe-like objects back to a PCollection.
This method creates and applies the actual Beam operations that compute the given deferred dataframes, returning a PCollection of their results.
If more than one (related) result is desired, it can be more efficient to pass them all at the same time to this method.