apache_beam.dataframe.schemas module¶
Utilities for relating schema-aware PCollections and DataFrame transforms.
The utilities here enforce the type mapping defined in
apache_beam.typehints.pandas_type_compatibility.
- 
class apache_beam.dataframe.schemas.BatchRowsAsDataFrame(*args, proxy=None, **kwargs)[source]¶
- Bases: - apache_beam.transforms.ptransform.PTransform- A transform that batches schema-aware PCollection elements into DataFrames - Batching parameters are inherited from - BatchElements.
- 
apache_beam.dataframe.schemas.generate_proxy(element_type)[source]¶
- Generate a proxy pandas object for the given PCollection element_type. - Currently only supports generating a DataFrame proxy from a schema-aware PCollection or a Series proxy from a primitively typed PCollection. 
- 
apache_beam.dataframe.schemas.element_type_from_dataframe(proxy, include_indexes=False)[source]¶
- Generate an element_type for an element-wise PCollection from a proxy pandas object. Currently only supports converting the element_type for a schema-aware PCollection to a proxy DataFrame. - Currently only supports generating a DataFrame proxy from a schema-aware PCollection. 
- 
class apache_beam.dataframe.schemas.UnbatchPandas(proxy, include_indexes=False)[source]¶
- Bases: - apache_beam.transforms.ptransform.PTransform- A transform that explodes a PCollection of DataFrame or Series. DataFrame is converterd to a schema-aware PCollection, while Series is converted to its underlying type. - Parameters: - include_indexes – (optional, default: False) When unbatching a DataFrame if include_indexes=True, attempt to include index columns in the output schema for expanded DataFrames. Raises an error if any of the index levels are unnamed (name=None), or if any of the names are not unique among all column and index names.