apache_beam.typehints.pandas_type_compatibility module¶
Utilities for converting between Beam schemas and pandas DataFrames.
Imposes a mapping between native Python typings (specifically those compatible
with apache_beam.typehints.schemas
), and common pandas dtypes:
pandas dtype Python typing
np.int{8,16,32,64} <-----> np.int{8,16,32,64}*
pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
np.float{32,64} <-----> Optional[np.float{32,64}]
\--- np.float{32,64}
Not supported <------ Optional[bytes]
np.bool <-----> np.bool
np.dtype('S') <-----> bytes
pd.BooleanDType() <-----> Optional[bool]
pd.StringDType() <-----> Optional[str]
\--- str
np.object <-----> Any
* int, float, bool are treated the same as np.int64, np.float64, np.bool
Note that when converting to pandas dtypes, any types not specified here are
shunted to np.object
.
Similarly when converting from pandas to Python types, types that aren’t
otherwise specified here are shunted to Any
. Notably, this includes
np.datetime64
.
Pandas does not support hierarchical data natively. Currently, all structured
types (Sequence
, Mapping
, nested NamedTuple
types), are
shunted to np.object
like all other unknown types. In the future these
types may be given special consideration.
Note utilities in this package are for internal use only, we make no backward compatibility guarantees, except for the type mapping itself.