apache_beam.typehints.schemas module¶
Support for mapping python types to proto Schemas and back again.
Imposes a mapping between common Python types and Beam portable schemas (https://s.apache.org/beam-schemas):
Python              Schema
np.int8     <-----> BYTE
np.int16    <-----> INT16
np.int32    <-----> INT32
np.int64    <-----> INT64
int         ------> INT64
np.float32  <-----> FLOAT
np.float64  <-----> DOUBLE
float       ------> DOUBLE
bool        <-----> BOOLEAN
str         <-----> STRING
bytes       <-----> BYTES
ByteString  ------> BYTES
Timestamp   <-----> LogicalType(urn="beam:logical_type:micros_instant:v1")
Decimal     <-----> LogicalType(urn="beam:logical_type:fixed_decimal:v1")
Mapping     <-----> MapType
Sequence    <-----> ArrayType
NamedTuple  <-----> RowType
beam.Row    ------> RowType
One direction mapping of Python types from Beam portable schemas:
- bytes
- <—— LogicalType(urn=”beam:logical_type:fixed_bytes:v1”) <—— LogicalType(urn=”beam:logical_type:var_bytes:v1”)
- str
- <—— LogicalType(urn=”beam:logical_type:fixed_char:v1”) <—— LogicalType(urn=”beam:logical_type:var_char:v1”)
- Timestamp
- <—— LogicalType(urn=”beam:logical_type:millis_instant:v1”)
Note that some of these mappings are provided as conveniences,
but they are lossy and will not survive a roundtrip from python to Beam schemas
and back. For example, the Python type int will map to INT64 in
Beam schemas but converting that back to a Python type will yield
np.int64.
nullable=True on a Beam FieldType is represented in Python by
wrapping the type in Optional.
This module is intended for internal use only. Nothing defined here provides any backwards-compatibility guarantee.
- 
apache_beam.typehints.schemas.named_fields_to_schema(names_and_types: Union[Dict[str, type], Sequence[Tuple[str, type]]], schema_id: Optional[str] = None, schema_options: Optional[Sequence[Tuple[str, Any]]] = None, field_options: Optional[Dict[str, Sequence[Tuple[str, Any]]]] = None, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>)[source]¶
- 
apache_beam.typehints.schemas.typing_to_runner_api(type_: type, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldType[source]¶
- 
apache_beam.typehints.schemas.typing_from_runner_api(fieldtype_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → type[source]¶
- 
apache_beam.typehints.schemas.value_to_runner_api(type_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue[source]¶
- 
apache_beam.typehints.schemas.value_from_runner_api(type_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue[source]¶
- 
apache_beam.typehints.schemas.option_to_runner_api(option: Tuple[str, Any], schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.Option[source]¶
- 
apache_beam.typehints.schemas.option_from_runner_api(option_proto: org.apache.beam.model.pipeline.v1.schema_pb2.Option, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → Tuple[str, Any][source]¶
- 
class apache_beam.typehints.schemas.SchemaTranslation(schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>)[source]¶
- Bases: - object- 
atomic_value_from_runner_api(atomic_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f317806baf0>, atomic_value: org.apache.beam.model.pipeline.v1.schema_pb2.AtomicTypeValue)[source]¶
 - 
atomic_value_to_runner_api(atomic_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7f317806baf0>, value) → org.apache.beam.model.pipeline.v1.schema_pb2.AtomicTypeValue[source]¶
 - 
value_from_runner_api(type_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue)[source]¶
 - 
value_to_runner_api(typing_proto: org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value)[source]¶
 - 
option_from_runner_api(option_proto: org.apache.beam.model.pipeline.v1.schema_pb2.Option) → Tuple[str, Any][source]¶
 - 
option_to_runner_api(option: Tuple[str, Any]) → org.apache.beam.model.pipeline.v1.schema_pb2.Option[source]¶
 
- 
- 
apache_beam.typehints.schemas.named_tuple_from_schema(schema, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → type[source]¶
- 
apache_beam.typehints.schemas.named_tuple_to_schema(named_tuple, schema_registry: apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]¶
- 
apache_beam.typehints.schemas.schema_from_element_type(element_type: type) → org.apache.beam.model.pipeline.v1.schema_pb2.Schema[source]¶
- Get a schema for the given PCollection element_type. - Returns schema as a list of (name, python_type) tuples 
- 
apache_beam.typehints.schemas.named_fields_from_element_type(element_type: type) → List[Tuple[str, type]][source]¶
- 
class apache_beam.typehints.schemas.LogicalType[source]¶
- Bases: - typing.Generic- 
classmethod language_type()[source]¶
- Return the language type this LogicalType encodes. - The returned type should match LanguageT 
 - 
classmethod representation_type()[source]¶
- Return the type of the representation this LogicalType uses to encode the language type. - The returned type should match RepresentationT 
 - 
classmethod argument_type()[source]¶
- Return the type of the argument used for variations of this LogicalType. - The returned type should match ArgT 
 - 
classmethod register_logical_type(logical_type_cls)[source]¶
- Register an implementation of LogicalType. 
 
- 
classmethod 
- 
class apache_beam.typehints.schemas.PassThroughLogicalType[source]¶
- Bases: - apache_beam.typehints.schemas.LogicalType- A base class for LogicalTypes that use the same type as the underlying representation type. 
- 
class apache_beam.typehints.schemas.MicrosInstantRepresentation(seconds, micros)¶
- Bases: - tuple- Create new instance of MicrosInstantRepresentation(seconds, micros) - 
micros¶
- Alias for field number 1 
 - 
seconds¶
- Alias for field number 0 
 
- 
- 
class apache_beam.typehints.schemas.MillisInstant[source]¶
- Bases: - apache_beam.typehints.schemas.NoArgumentLogicalType- Millisecond-precision instant logical type handles values consistent with that encoded by - InstantCoderin the Java SDK.- This class handles - apache_beam.utils.timestamp.Timestamplanguage type as- MicrosInstant, but it only provides millisecond precision, because it is aimed to handle data encoded by Java sdk’s InstantCoder which has same precision level.- Timestamp is handled by MicrosInstant by default. In some scenario, such as read from cross-language transform with rows containing InstantCoder encoded timestamps, one may need to override the mapping of Timetamp to MillisInstant. To do this, re-register this class with - register_logical_type().
- 
class apache_beam.typehints.schemas.MicrosInstant[source]¶
- Bases: - apache_beam.typehints.schemas.NoArgumentLogicalType- Microsecond-precision instant logical type that handles - Timestamp.
- 
class apache_beam.typehints.schemas.PythonCallable[source]¶
- Bases: - apache_beam.typehints.schemas.NoArgumentLogicalType- A logical type for PythonCallableSource objects. 
- 
class apache_beam.typehints.schemas.FixedPrecisionDecimalArgumentRepresentation(precision, scale)¶
- Bases: - tuple- Create new instance of FixedPrecisionDecimalArgumentRepresentation(precision, scale) - 
precision¶
- Alias for field number 0 
 - 
scale¶
- Alias for field number 1 
 
- 
- 
class apache_beam.typehints.schemas.DecimalLogicalType[source]¶
- Bases: - apache_beam.typehints.schemas.NoArgumentLogicalType- A logical type for decimal objects handling values consistent with that encoded by - BigDecimalCoderin the Java SDK.
- 
class apache_beam.typehints.schemas.FixedPrecisionDecimalLogicalType(precision=-1, scale=0)[source]¶
- Bases: - apache_beam.typehints.schemas.LogicalType- A wrapper of DecimalLogicalType that contains the precision value. 
- 
class apache_beam.typehints.schemas.FixedBytes(length: numpy.int32)[source]¶
- Bases: - apache_beam.typehints.schemas.PassThroughLogicalType- A logical type for fixed-length bytes. 
- 
class apache_beam.typehints.schemas.VariableBytes(max_length: numpy.int32 = 2147483647)[source]¶
- Bases: - apache_beam.typehints.schemas.PassThroughLogicalType- A logical type for variable-length bytes with specified maximum length. 
- 
class apache_beam.typehints.schemas.FixedString(length: numpy.int32)[source]¶
- Bases: - apache_beam.typehints.schemas.PassThroughLogicalType- A logical type for fixed-length string. 
- 
class apache_beam.typehints.schemas.VariableString(max_length: numpy.int32 = 2147483647)[source]¶
- Bases: - apache_beam.typehints.schemas.PassThroughLogicalType- A logical type for variable-length string with specified maximum length.