apache_beam.typehints.schemas module

Support for mapping python types to proto Schemas and back again.

Imposes a mapping between common Python types and Beam portable schemas (https://s.apache.org/beam-schemas):

Python              Schema
np.int8     <-----> BYTE
np.int16    <-----> INT16
np.int32    <-----> INT32
np.int64    <-----> INT64
int         ------> INT64
np.float32  <-----> FLOAT
np.float64  <-----> DOUBLE
float       ------> DOUBLE
bool        <-----> BOOLEAN
str         <-----> STRING
bytes       <-----> BYTES
ByteString  ------> BYTES
Timestamp   <-----> LogicalType(urn="beam:logical_type:micros_instant:v1")
Timestamp   <------ LogicalType(urn="beam:logical_type:millis_instant:v1")
Mapping     <-----> MapType
Sequence    <-----> ArrayType
NamedTuple  <-----> RowType
beam.Row    ------> RowType

Note that some of these mappings are provided as conveniences, but they are lossy and will not survive a roundtrip from python to Beam schemas and back. For example, the Python type int will map to INT64 in Beam schemas but converting that back to a Python type will yield np.int64.

nullable=True on a Beam FieldType is represented in Python by wrapping the type in Optional.

This module is intended for internal use only. Nothing defined here provides any backwards-compatibility guarantee.

apache_beam.typehints.schemas.named_fields_to_schema(names_and_types: Union[Dict[str, type], Sequence[Tuple[str, type]]], schema_id: Optional[str] = None, schema_options: Optional[Sequence[Tuple[str, Any]]] = None, field_options: Optional[Dict[str, Sequence[Tuple[str, Any]]]] = None, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>)[source]
apache_beam.typehints.schemas.named_fields_from_schema(schema)[source]
apache_beam.typehints.schemas.typing_to_runner_api(type_: type, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldType[source]
apache_beam.typehints.schemas.typing_from_runner_api(fieldtype_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → type[source]
apache_beam.typehints.schemas.option_to_runner_api(option: Tuple[str, Any], schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.Option[source]
apache_beam.typehints.schemas.option_from_runner_api(option_proto: org.apache.beam.model.pipeline.v1.schema_pb2.Option, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → Tuple[str, Any][source]
class apache_beam.typehints.schemas.SchemaTranslation(schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>)[source]

Bases: object

typing_to_runner_api(type_: type) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldType[source]
option_from_runner_api(option_proto: org.apache.beam.model.pipeline.v1.schema_pb2.Option) → Tuple[str, Any][source]
option_to_runner_api(option: Tuple[str, Any]) → org.apache.beam.model.pipeline.v1.schema_pb2.Option[source]
typing_from_runner_api(fieldtype_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType) → type[source]
named_tuple_from_schema(schema: org.apache.beam.model.pipeline.v1.schema_pb2.Schema) → type[source]
apache_beam.typehints.schemas.named_tuple_from_schema(schema, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → type[source]
apache_beam.typehints.schemas.named_tuple_to_schema(named_tuple, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]
apache_beam.typehints.schemas.schema_from_element_type(element_type: type) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]

Get a schema for the given PCollection element_type.

Returns schema as a list of (name, python_type) tuples

apache_beam.typehints.schemas.named_fields_from_element_type(element_type: type) → List[Tuple[str, type]][source]
class apache_beam.typehints.schemas.LogicalTypeRegistry[source]

Bases: object

add(urn, logical_type)[source]
get_logical_type_by_urn(urn)[source]
get_urn_by_logial_type(logical_type)[source]
get_logical_type_by_language_type(representation_type)[source]
class apache_beam.typehints.schemas.LogicalType[source]

Bases: typing.Generic

classmethod urn()[source]

Return the URN used to identify this logical type

classmethod language_type()[source]

Return the language type this LogicalType encodes.

The returned type should match LanguageT

classmethod representation_type()[source]

Return the type of the representation this LogicalType uses to encode the language type.

The returned type should match RepresentationT

classmethod argument_type()[source]

Return the type of the argument used for variations of this LogicalType.

The returned type should match ArgT

argument()[source]

Return the argument for this instance of the LogicalType.

to_representation_type(value)[source]

Convert an instance of LanguageT to RepresentationT.

to_language_type(value)[source]

Convert an instance of RepresentationT to LanguageT.

classmethod register_logical_type(logical_type_cls)[source]

Register an implementation of LogicalType.

classmethod from_typing(typ)[source]

Construct an instance of a registered LogicalType implementation given a typing.

Raises ValueError if no registered LogicalType implementation can encode the given typing.

classmethod from_runner_api(logical_type_proto)[source]

Construct an instance of a registered LogicalType implementation given a proto LogicalType.

Raises ValueError if no LogicalType registered for the given URN.

class apache_beam.typehints.schemas.NoArgumentLogicalType[source]

Bases: apache_beam.typehints.schemas.LogicalType

classmethod argument_type()[source]
argument()[source]
class apache_beam.typehints.schemas.MicrosInstantRepresentation(seconds, micros)

Bases: tuple

Create new instance of MicrosInstantRepresentation(seconds, micros)

micros

Alias for field number 1

seconds

Alias for field number 0

class apache_beam.typehints.schemas.MillisInstant[source]

Bases: apache_beam.typehints.schemas.NoArgumentLogicalType

Millisecond-precision instant logical type handles values consistent with that encoded by InstantCoder in the Java SDK.

This class handles apache_beam.utils.timestamp.Timestamp language type as MicrosInstant, but it only provides millisecond precision, because it is aimed to handle data encoded by Java sdk’s InstantCoder which has same precision level.

Timestamp is handled by MicrosInstant by default. In some scenario, such as read from cross-language transform with rows containing InstantCoder encoded timestamps, one may need to override the mapping of Timetamp to MillisInstant. To do this, re-register this class with register_logical_type().

classmethod representation_type()[source]
classmethod urn()[source]
classmethod language_type()[source]
to_language_type(value)[source]
class apache_beam.typehints.schemas.MicrosInstant[source]

Bases: apache_beam.typehints.schemas.NoArgumentLogicalType

Microsecond-precision instant logical type that handles Timestamp.

classmethod urn()[source]
classmethod representation_type()[source]
classmethod language_type()[source]
to_representation_type(value)[source]
to_language_type(value)[source]
class apache_beam.typehints.schemas.PythonCallable[source]

Bases: apache_beam.typehints.schemas.NoArgumentLogicalType

A logical type for PythonCallableSource objects.

classmethod urn()[source]
classmethod representation_type()[source]
classmethod language_type()[source]
to_representation_type(value)[source]
to_language_type(value)[source]