apache_beam.typehints.schemas module¶
Support for mapping python types to proto Schemas and back again.
Imposes a mapping between common Python types and Beam portable schemas (https://s.apache.org/beam-schemas):
Python Schema
np.int8 <-----> BYTE
np.int16 <-----> INT16
np.int32 <-----> INT32
np.int64 <-----> INT64
int ------> INT64
np.float32 <-----> FLOAT
np.float64 <-----> DOUBLE
float ------> DOUBLE
bool <-----> BOOLEAN
str <-----> STRING
bytes <-----> BYTES
ByteString ------> BYTES
Timestamp <-----> LogicalType(urn="beam:logical_type:micros_instant:v1")
Decimal <-----> LogicalType(urn="beam:logical_type:fixed_decimal:v1")
Mapping <-----> MapType
Sequence <-----> ArrayType
NamedTuple <-----> RowType
beam.Row ------> RowType
One direction mapping of Python types from Beam portable schemas:
- bytes
- <—— LogicalType(urn=”beam:logical_type:fixed_bytes:v1”) <—— LogicalType(urn=”beam:logical_type:var_bytes:v1”)
- str
- <—— LogicalType(urn=”beam:logical_type:fixed_char:v1”) <—— LogicalType(urn=”beam:logical_type:var_char:v1”)
- Timestamp
- <—— LogicalType(urn=”beam:logical_type:millis_instant:v1”)
Note that some of these mappings are provided as conveniences,
but they are lossy and will not survive a roundtrip from python to Beam schemas
and back. For example, the Python type int
will map to INT64
in
Beam schemas but converting that back to a Python type will yield
np.int64
.
nullable=True
on a Beam FieldType
is represented in Python by
wrapping the type in Optional
.
This module is intended for internal use only. Nothing defined here provides any backwards-compatibility guarantee.
-
apache_beam.typehints.schemas.
named_fields_to_schema
(names_and_types: Union[Dict[str, type], Sequence[Tuple[str, type]]], schema_id: Optional[str] = None, schema_options: Optional[Sequence[Tuple[str, Any]]] = None, field_options: Optional[Dict[str, Sequence[Tuple[str, Any]]]] = None, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>)[source]¶
-
apache_beam.typehints.schemas.
typing_to_runner_api
(type_: type, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldType[source]¶
-
apache_beam.typehints.schemas.
typing_from_runner_api
(fieldtype_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → type[source]¶
-
apache_beam.typehints.schemas.
value_to_runner_api
(type_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue[source]¶
-
apache_beam.typehints.schemas.
value_from_runner_api
(type_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue[source]¶
-
apache_beam.typehints.schemas.
option_to_runner_api
(option: Tuple[str, Any], schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.Option[source]¶
-
apache_beam.typehints.schemas.
option_from_runner_api
(option_proto: org.apache.beam.model.pipeline.v1.schema_pb2.Option, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → Tuple[str, Any][source]¶
-
apache_beam.typehints.schemas.
schema_field
(name: str, field_type: Union[org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, type], description: Optional[str] = None) → org.apache.beam.model.pipeline.v1.schema_pb2.Field[source]¶
-
class
apache_beam.typehints.schemas.
SchemaTranslation
(schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>)[source]¶ Bases:
object
-
atomic_value_from_runner_api
(atomic_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f9177b81ca0>, atomic_value: org.apache.beam.model.pipeline.v1.schema_pb2.AtomicTypeValue)[source]¶
-
atomic_value_to_runner_api
(atomic_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f9177b81ca0>, value) → org.apache.beam.model.pipeline.v1.schema_pb2.AtomicTypeValue[source]¶
-
value_from_runner_api
(type_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue)[source]¶
-
value_to_runner_api
(typing_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value)[source]¶
-
option_from_runner_api
(option_proto: org.apache.beam.model.pipeline.v1.schema_pb2.Option) → Tuple[str, Any][source]¶
-
option_to_runner_api
(option: Tuple[str, Any]) → org.apache.beam.model.pipeline.v1.schema_pb2.Option[source]¶
-
-
apache_beam.typehints.schemas.
named_tuple_from_schema
(schema, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → type[source]¶
-
apache_beam.typehints.schemas.
named_tuple_to_schema
(named_tuple, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]¶
-
apache_beam.typehints.schemas.
schema_from_element_type
(element_type: type) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]¶ Get a schema for the given PCollection element_type.
Returns schema as a list of (name, python_type) tuples
-
apache_beam.typehints.schemas.
named_fields_from_element_type
(element_type: type) → List[Tuple[str, type]][source]¶
-
apache_beam.typehints.schemas.
union_schema_type
(element_types)[source]¶ Returns a schema whose fields are the union of each corresponding field.
element_types must be a set of schema-aware types whose fields have the same naming and ordering.
-
class
apache_beam.typehints.schemas.
LogicalType
[source]¶ Bases:
typing.Generic
-
classmethod
language_type
()[source]¶ Return the language type this LogicalType encodes.
The returned type should match LanguageT
-
classmethod
representation_type
()[source]¶ Return the type of the representation this LogicalType uses to encode the language type.
The returned type should match RepresentationT
-
classmethod
argument_type
()[source]¶ Return the type of the argument used for variations of this LogicalType.
The returned type should match ArgT
-
classmethod
register_logical_type
(logical_type_cls)[source]¶ Register an implementation of LogicalType.
-
classmethod
-
class
apache_beam.typehints.schemas.
PassThroughLogicalType
[source]¶ Bases:
apache_beam.typehints.schemas.LogicalType
A base class for LogicalTypes that use the same type as the underlying representation type.
-
class
apache_beam.typehints.schemas.
MicrosInstantRepresentation
(seconds, micros)¶ Bases:
tuple
Create new instance of MicrosInstantRepresentation(seconds, micros)
-
micros
¶ Alias for field number 1
-
seconds
¶ Alias for field number 0
-
-
class
apache_beam.typehints.schemas.
MillisInstant
[source]¶ Bases:
apache_beam.typehints.schemas.NoArgumentLogicalType
Millisecond-precision instant logical type handles values consistent with that encoded by
InstantCoder
in the Java SDK.This class handles
apache_beam.utils.timestamp.Timestamp
language type asMicrosInstant
, but it only provides millisecond precision, because it is aimed to handle data encoded by Java sdk’s InstantCoder which has same precision level.Timestamp is handled by MicrosInstant by default. In some scenario, such as read from cross-language transform with rows containing InstantCoder encoded timestamps, one may need to override the mapping of Timetamp to MillisInstant. To do this, re-register this class with
register_logical_type()
.
-
class
apache_beam.typehints.schemas.
MicrosInstant
[source]¶ Bases:
apache_beam.typehints.schemas.NoArgumentLogicalType
Microsecond-precision instant logical type that handles
Timestamp
.
-
class
apache_beam.typehints.schemas.
PythonCallable
[source]¶ Bases:
apache_beam.typehints.schemas.NoArgumentLogicalType
A logical type for PythonCallableSource objects.
-
class
apache_beam.typehints.schemas.
FixedPrecisionDecimalArgumentRepresentation
(precision, scale)¶ Bases:
tuple
Create new instance of FixedPrecisionDecimalArgumentRepresentation(precision, scale)
-
precision
¶ Alias for field number 0
-
scale
¶ Alias for field number 1
-
-
class
apache_beam.typehints.schemas.
DecimalLogicalType
[source]¶ Bases:
apache_beam.typehints.schemas.NoArgumentLogicalType
A logical type for decimal objects handling values consistent with that encoded by
BigDecimalCoder
in the Java SDK.
-
class
apache_beam.typehints.schemas.
FixedPrecisionDecimalLogicalType
(precision=-1, scale=0)[source]¶ Bases:
apache_beam.typehints.schemas.LogicalType
A wrapper of DecimalLogicalType that contains the precision value.
-
class
apache_beam.typehints.schemas.
FixedBytes
(length: numpy.int32)[source]¶ Bases:
apache_beam.typehints.schemas.PassThroughLogicalType
A logical type for fixed-length bytes.
-
class
apache_beam.typehints.schemas.
VariableBytes
(max_length: numpy.int32 = 2147483647)[source]¶ Bases:
apache_beam.typehints.schemas.PassThroughLogicalType
A logical type for variable-length bytes with specified maximum length.
-
class
apache_beam.typehints.schemas.
FixedString
(length: numpy.int32)[source]¶ Bases:
apache_beam.typehints.schemas.PassThroughLogicalType
A logical type for fixed-length string.
-
class
apache_beam.typehints.schemas.
VariableString
(max_length: numpy.int32 = 2147483647)[source]¶ Bases:
apache_beam.typehints.schemas.PassThroughLogicalType
A logical type for variable-length string with specified maximum length.