apache_beam.typehints.schemas module

Support for mapping python types to proto Schemas and back again.

Imposes a mapping between common Python types and Beam portable schemas (https://s.apache.org/beam-schemas):

Python              Schema
np.int8     <-----> BYTE
np.int16    <-----> INT16
np.int32    <-----> INT32
np.int64    <-----> INT64
int         ------> INT64
np.float32  <-----> FLOAT
np.float64  <-----> DOUBLE
float       ------> DOUBLE
bool        <-----> BOOLEAN
str         <-----> STRING
bytes       <-----> BYTES
ByteString  ------> BYTES
Timestamp   <-----> LogicalType(urn="beam:logical_type:micros_instant:v1")
Mapping     <-----> MapType
Sequence    <-----> ArrayType
NamedTuple  <-----> RowType
beam.Row    ------> RowType

Note that some of these mappings are provided as conveniences, but they are lossy and will not survive a roundtrip from python to Beam schemas and back. For example, the Python type int will map to INT64 in Beam schemas but converting that back to a Python type will yield np.int64.

nullable=True on a Beam FieldType is represented in Python by wrapping the type in Optional.

This module is intended for internal use only. Nothing defined here provides any backwards-compatibility guarantee.

class apache_beam.typehints.schemas.SchemaTypeRegistry[source]

Bases: object

generate_new_id()[source]
add(typing, schema)[source]
get_typing_by_id(unique_id)[source]
get_schema_by_id(unique_id)[source]
apache_beam.typehints.schemas.named_fields_to_schema(names_and_types)[source]
apache_beam.typehints.schemas.named_fields_from_schema(schema)[source]
apache_beam.typehints.schemas.typing_to_runner_api(type_: type, schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldType[source]
apache_beam.typehints.schemas.typing_from_runner_api(fieldtype_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>) → type[source]
class apache_beam.typehints.schemas.SchemaTranslation(schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>)[source]

Bases: object

typing_to_runner_api(type_: type) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldType[source]
option_from_runner_api(option_proto: org.apache.beam.model.pipeline.v1.schema_pb2.Option) → Tuple[str, Any][source]
option_to_runner_api(option: Tuple[str, Any]) → org.apache.beam.model.pipeline.v1.schema_pb2.Option[source]
typing_from_runner_api(fieldtype_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType) → type[source]
apache_beam.typehints.schemas.named_tuple_from_schema(schema, schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>) → type[source]
apache_beam.typehints.schemas.named_tuple_to_schema(named_tuple, schema_registry: apache_beam.typehints.schemas.SchemaTypeRegistry = <apache_beam.typehints.schemas.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]
apache_beam.typehints.schemas.schema_from_element_type(element_type: type) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]

Get a schema for the given PCollection element_type.

Returns schema as a list of (name, python_type) tuples

apache_beam.typehints.schemas.named_fields_from_element_type(element_type: type) → List[Tuple[str, type]][source]
class apache_beam.typehints.schemas.LogicalTypeRegistry[source]

Bases: object

add(urn, logical_type)[source]
get_logical_type_by_urn(urn)[source]
get_urn_by_logial_type(logical_type)[source]
get_logical_type_by_language_type(representation_type)[source]
class apache_beam.typehints.schemas.LogicalType[source]

Bases: typing.Generic

classmethod urn()[source]

Return the URN used to identify this logical type

classmethod language_type()[source]

Return the language type this LogicalType encodes.

The returned type should match LanguageT

classmethod representation_type()[source]

Return the type of the representation this LogicalType uses to encode the language type.

The returned type should match RepresentationT

classmethod argument_type()[source]

Return the type of the argument used for variations of this LogicalType.

The returned type should match ArgT

argument()[source]

Return the argument for this instance of the LogicalType.

to_representation_type()[source]

Convert an instance of LanguageT to RepresentationT.

to_language_type()[source]

Convert an instance of RepresentationT to LanguageT.

classmethod register_logical_type(logical_type_cls)[source]

Register an implementation of LogicalType.

classmethod from_typing(typ)[source]

Construct an instance of a registered LogicalType implementation given a typing.

Raises ValueError if no registered LogicalType implementation can encode the given typing.

classmethod from_runner_api(logical_type_proto)[source]

Construct an instance of a registered LogicalType implementation given a proto LogicalType.

Raises ValueError if no LogicalType registered for the given URN.

class apache_beam.typehints.schemas.NoArgumentLogicalType[source]

Bases: apache_beam.typehints.schemas.LogicalType

classmethod argument_type()[source]
argument()[source]
class apache_beam.typehints.schemas.MicrosInstantRepresentation(seconds, micros)

Bases: tuple

Create new instance of MicrosInstantRepresentation(seconds, micros)

micros

Alias for field number 1

seconds

Alias for field number 0