apache_beam.typehints.schemas module¶
Support for mapping python types to proto Schemas and back again.
Imposes a mapping between common Python types and Beam portable schemas (https://s.apache.org/beam-schemas):
Python Schema
np.int8 <-----> BYTE
np.int16 <-----> INT16
np.int32 <-----> INT32
np.int64 <-----> INT64
int ------> INT64
np.float32 <-----> FLOAT
np.float64 <-----> DOUBLE
float ------> DOUBLE
bool <-----> BOOLEAN
str <-----> STRING
bytes <-----> BYTES
ByteString ------> BYTES
Timestamp <-----> LogicalType(urn="beam:logical_type:micros_instant:v1")
Mapping <-----> MapType
Sequence <-----> ArrayType
NamedTuple <-----> RowType
beam.Row ------> RowType
Note that some of these mappings are provided as conveniences,
but they are lossy and will not survive a roundtrip from python to Beam schemas
and back. For example, the Python type int
will map to INT64
in
Beam schemas but converting that back to a Python type will yield
np.int64
.
nullable=True
on a Beam FieldType
is represented in Python by
wrapping the type in Optional
.
This module is intended for internal use only. Nothing defined here provides any backwards-compatibility guarantee.
-
apache_beam.typehints.schemas.
typing_to_runner_api
(type_: type, schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldType[source]¶
-
apache_beam.typehints.schemas.
typing_from_runner_api
(fieldtype_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>) → type[source]¶
-
class
apache_beam.typehints.schemas.
SchemaTranslation
(schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>)[source]¶ Bases:
object
-
option_from_runner_api
(option_proto: org.apache.beam.model.pipeline.v1.schema_pb2.Option) → Tuple[str, Any][source]¶
-
-
apache_beam.typehints.schemas.
named_tuple_from_schema
(schema, schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>) → type[source]¶
-
apache_beam.typehints.schemas.
named_tuple_to_schema
(named_tuple, schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]¶
-
apache_beam.typehints.schemas.
schema_from_element_type
(element_type: type) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]¶ Get a schema for the given PCollection element_type.
Returns schema as a list of (name, python_type) tuples
-
apache_beam.typehints.schemas.
named_fields_from_element_type
(element_type: type) → List[Tuple[str, type]][source]¶
-
class
apache_beam.typehints.schemas.
LogicalType
[source]¶ Bases:
typing.Generic
-
classmethod
language_type
()[source]¶ Return the language type this LogicalType encodes.
The returned type should match LanguageT
-
classmethod
representation_type
()[source]¶ Return the type of the representation this LogicalType uses to encode the language type.
The returned type should match RepresentationT
-
classmethod
argument_type
()[source]¶ Return the type of the argument used for variations of this LogicalType.
The returned type should match ArgT
-
classmethod
register_logical_type
(logical_type_cls)[source]¶ Register an implementation of LogicalType.
-
classmethod