apache_beam.typehints.pandas_type_compatibility module

Utilities for converting between Beam schemas and pandas DataFrames.

Imposes a mapping between native Python typings (specifically those compatible with apache_beam.typehints.schemas), and common pandas dtypes:

pandas dtype                    Python typing
np.int{8,16,32,64}      <-----> np.int{8,16,32,64}*
pd.Int{8,16,32,64}Dtype <-----> Optional[np.int{8,16,32,64}]*
np.float{32,64}         <-----> Optional[np.float{32,64}]
                           \--- np.float{32,64}
Not supported           <------ Optional[bytes]
np.bool                 <-----> np.bool
np.dtype('S')           <-----> bytes
pd.BooleanDType()       <-----> Optional[bool]
pd.StringDType()        <-----> Optional[str]
                           \--- str
np.object               <-----> Any

* int, float, bool are treated the same as np.int64, np.float64, np.bool

Note that when converting to pandas dtypes, any types not specified here are shunted to np.object.

Similarly when converting from pandas to Python types, types that aren’t otherwise specified here are shunted to Any. Notably, this includes np.datetime64.

Pandas does not support hierarchical data natively. Currently, all structured types (Sequence, Mapping, nested NamedTuple types), are shunted to np.object like all other unknown types. In the future these types may be given special consideration.

Note utilities in this package are for internal use only, we make no backward compatibility guarantees, except for the type mapping itself.