apache_beam.coders package

Submodules

apache_beam.coders.coder_impl module

Coder implementations.

The actual encode/decode implementations are split off from coders to allow conditional (compiled/pure) implementations, which can be used to encode many elements with minimal overhead.

This module may be optionally compiled with Cython, using the corresponding coder_impl.pxd file for type hints.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.coders.coder_impl.AbstractComponentCoderImpl(coder_impls)[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

CoderImpl for coders that are comprised of several component coders.

decode_from_stream(in_stream, nested)[source]
encode_to_stream(value, out, nested)[source]
estimate_size(value, nested=False)[source]

Estimates the encoded size of the given value, in bytes.

get_estimated_size_and_observables(value, nested=False)[source]

Returns estimated size of value along with any nested observables.

class apache_beam.coders.coder_impl.BytesCoderImpl[source]

Bases: apache_beam.coders.coder_impl.CoderImpl

For internal use only; no backwards-compatibility guarantees.

A coder for bytes/str objects.

decode(encoded)[source]
decode_from_stream(in_stream, nested)[source]
encode(value)[source]
encode_to_stream(value, out, nested)[source]
class apache_beam.coders.coder_impl.CallbackCoderImpl(encoder, decoder, size_estimator=None)[source]

Bases: apache_beam.coders.coder_impl.CoderImpl

For internal use only; no backwards-compatibility guarantees.

A CoderImpl that calls back to the _impl methods on the Coder itself.

This is the default implementation used if Coder._get_impl() is not overwritten.

decode(encoded)[source]
decode_from_stream(stream, nested)[source]
encode(value)[source]
encode_to_stream(value, stream, nested)[source]
estimate_size(value, nested=False)[source]
get_estimated_size_and_observables(value, nested=False)[source]
class apache_beam.coders.coder_impl.CoderImpl[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

decode(encoded)[source]

Decodes an object to an unnested string.

decode_from_stream(stream, nested)[source]

Reads object from potentially-nested encoding in stream.

encode(value)[source]

Encodes an object to an unnested string.

encode_to_stream(value, stream, nested)[source]

Reads object from potentially-nested encoding in stream.

estimate_size(value, nested=False)[source]

Estimates the encoded size of the given value, in bytes.

get_estimated_size_and_observables(value, nested=False)[source]

Returns estimated size of value along with any nested observables.

The list of nested observables is returned as a list of 2-tuples of (obj, coder_impl), where obj is an instance of observable.ObservableMixin, and coder_impl is the CoderImpl that can be used to encode elements sent by obj to its observers.

Parameters:
  • value – the value whose encoded size is to be estimated.
  • nested – whether the value is nested.
Returns:

The estimated encoded size of the given value and a list of observables whose elements are 2-tuples of (obj, coder_impl) as described above.

class apache_beam.coders.coder_impl.DeterministicFastPrimitivesCoderImpl(coder, step_label)[source]

Bases: apache_beam.coders.coder_impl.CoderImpl

For internal use only; no backwards-compatibility guarantees.

decode(encoded)[source]
decode_from_stream(stream, nested)[source]
encode(value)[source]
encode_to_stream(value, stream, nested)[source]
estimate_size(value, nested=False)[source]
get_estimated_size_and_observables(value, nested=False)[source]
class apache_beam.coders.coder_impl.FastPrimitivesCoderImpl(fallback_coder_impl)[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

decode_from_stream(stream, nested)[source]
encode_to_stream(value, stream, nested)[source]
get_estimated_size_and_observables(value, nested=False)[source]
class apache_beam.coders.coder_impl.FloatCoderImpl[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

decode_from_stream(in_stream, nested)[source]
encode_to_stream(value, out, nested)[source]
estimate_size(unused_value, nested=False)[source]
class apache_beam.coders.coder_impl.IntervalWindowCoderImpl[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

decode_from_stream(in_, nested)[source]
encode_to_stream(value, out, nested)[source]
estimate_size(value, nested=False)[source]
class apache_beam.coders.coder_impl.IterableCoderImpl(elem_coder)[source]

Bases: apache_beam.coders.coder_impl.SequenceCoderImpl

For internal use only; no backwards-compatibility guarantees.

A coder for homogeneous iterable objects.

class apache_beam.coders.coder_impl.LengthPrefixCoderImpl(value_coder)[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

Coder which prefixes the length of the encoded object in the stream.

decode_from_stream(in_stream, nested)[source]
encode_to_stream(value, out, nested)[source]
estimate_size(value, nested=False)[source]
class apache_beam.coders.coder_impl.ProtoCoderImpl(proto_message_type)[source]

Bases: apache_beam.coders.coder_impl.SimpleCoderImpl

For internal use only; no backwards-compatibility guarantees.

decode(encoded)[source]
encode(value)[source]
class apache_beam.coders.coder_impl.SequenceCoderImpl(elem_coder)[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

A coder for sequences.

If the length of the sequence in known we encode the length as a 32 bit int followed by the encoded bytes.

If the length of the sequence is unknown, we encode the length as -1 followed by the encoding of elements buffered up to 64K bytes before prefixing the count of number of elements. A 0 is encoded at the end to indicate the end of stream.

The resulting encoding would look like this:

-1
countA element(0) element(1) ... element(countA - 1)
countB element(0) element(1) ... element(countB - 1)
...
countX element(0) element(1) ... element(countX - 1)
0
decode_from_stream(in_stream, nested)[source]
encode_to_stream(value, out, nested)[source]
estimate_size(value, nested=False)[source]

Estimates the encoded size of the given value, in bytes.

get_estimated_size_and_observables(value, nested=False)[source]

Returns estimated size of value along with any nested observables.

class apache_beam.coders.coder_impl.SimpleCoderImpl[source]

Bases: apache_beam.coders.coder_impl.CoderImpl

For internal use only; no backwards-compatibility guarantees.

Subclass of CoderImpl implementing stream methods using encode/decode.

decode_from_stream(stream, nested)[source]

Reads object from potentially-nested encoding in stream.

encode_to_stream(value, stream, nested)[source]

Reads object from potentially-nested encoding in stream.

class apache_beam.coders.coder_impl.SingletonCoderImpl(value)[source]

Bases: apache_beam.coders.coder_impl.CoderImpl

For internal use only; no backwards-compatibility guarantees.

A coder that always encodes exactly one value.

decode(encoded)[source]
decode_from_stream(stream, nested)[source]
encode(value)[source]
encode_to_stream(value, stream, nested)[source]
estimate_size(value, nested=False)[source]
class apache_beam.coders.coder_impl.StreamCoderImpl[source]

Bases: apache_beam.coders.coder_impl.CoderImpl

For internal use only; no backwards-compatibility guarantees.

Subclass of CoderImpl implementing encode/decode using stream methods.

decode(encoded)[source]
encode(value)[source]
estimate_size(value, nested=False)[source]

Estimates the encoded size of the given value, in bytes.

class apache_beam.coders.coder_impl.TimestampCoderImpl[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

decode_from_stream(in_stream, nested)[source]
encode_to_stream(value, out, nested)[source]
estimate_size(unused_value, nested=False)[source]
class apache_beam.coders.coder_impl.TupleCoderImpl(coder_impls)[source]

Bases: apache_beam.coders.coder_impl.AbstractComponentCoderImpl

A coder for tuple objects.

class apache_beam.coders.coder_impl.TupleSequenceCoderImpl(elem_coder)[source]

Bases: apache_beam.coders.coder_impl.SequenceCoderImpl

For internal use only; no backwards-compatibility guarantees.

A coder for homogeneous tuple objects.

class apache_beam.coders.coder_impl.VarIntCoderImpl[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

A coder for long/int objects.

decode(encoded)[source]
decode_from_stream(in_stream, nested)[source]
encode(value)[source]
encode_to_stream(value, out, nested)[source]
estimate_size(value, nested=False)[source]
class apache_beam.coders.coder_impl.WindowedValueCoderImpl(value_coder, timestamp_coder, window_coder)[source]

Bases: apache_beam.coders.coder_impl.StreamCoderImpl

For internal use only; no backwards-compatibility guarantees.

A coder for windowed values.

decode_from_stream(in_stream, nested)[source]
encode_to_stream(value, out, nested)[source]
get_estimated_size_and_observables(value, nested=False)[source]

Returns estimated size of value along with any nested observables.

apache_beam.coders.coders module

Collection of useful coders.

Only those coders listed in __all__ are part of the public API of this module.

class apache_beam.coders.coders.Coder[source]

Bases: object

Base class for coders.

as_cloud_object()[source]

For internal use only; no backwards-compatibility guarantees.

Returns Google Cloud Dataflow API description of this coder.

decode(encoded)[source]

Decodes the given byte string into the corresponding object.

encode(value)[source]

Encodes the given object into a byte string.

estimate_size(value)[source]

Estimates the encoded size of the given value, in bytes.

Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection.

The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance.

Parameters:value – the value whose encoded size is to be estimated.
Returns:The estimated encoded size of the given value.
static from_runner_api(proto, context)[source]

For internal use only; no backwards-compatibility guarantees.

classmethod from_type_hint(unused_typehint, unused_registry)[source]
get_impl()[source]

For internal use only; no backwards-compatibility guarantees.

Returns the CoderImpl backing this Coder.

is_deterministic()[source]

Whether this coder is guaranteed to encode values deterministically.

A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.

For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.

Returns:Whether coder is deterministic.
is_kv_coder()[source]
key_coder()[source]
to_runner_api(context)[source]

For internal use only; no backwards-compatibility guarantees.

value_coder()[source]
class apache_beam.coders.coders.BytesCoder[source]

Bases: apache_beam.coders.coders.FastCoder

Byte string coder.

is_deterministic()[source]
class apache_beam.coders.coders.DillCoder[source]

Bases: apache_beam.coders.coders._PickleCoderBase

Coder using dill’s pickle functionality.

class apache_beam.coders.coders.FastPrimitivesCoder(fallback_coder=PickleCoder)[source]

Bases: apache_beam.coders.coders.FastCoder

Encodes simple primitives (e.g. str, int) efficiently.

For unknown types, falls back to another coder (e.g. PickleCoder).

as_cloud_object(is_pair_like=True)[source]
is_deterministic()[source]
is_kv_coder()[source]
key_coder()[source]
value_coder()[source]
class apache_beam.coders.coders.FloatCoder[source]

Bases: apache_beam.coders.coders.FastCoder

A coder used for floating-point values.

is_deterministic()[source]
class apache_beam.coders.coders.IterableCoder(elem_coder)[source]

Bases: apache_beam.coders.coders.FastCoder

Coder of iterables of homogeneous objects.

as_cloud_object()[source]
static from_type_hint(typehint, registry)[source]
is_deterministic()[source]
value_coder()[source]
class apache_beam.coders.coders.PickleCoder[source]

Bases: apache_beam.coders.coders._PickleCoderBase

Coder using Python’s pickle functionality.

class apache_beam.coders.coders.ProtoCoder(proto_message_type)[source]

Bases: apache_beam.coders.coders.FastCoder

A Coder for Google Protocol Buffers.

It supports both Protocol Buffers syntax versions 2 and 3. However, the runtime version of the python protobuf library must exactly match the version of the protoc compiler what was used to generate the protobuf messages.

ProtoCoder is registered in the global CoderRegistry as the default coder for any protobuf Message object.

static from_type_hint(typehint, unused_registry)[source]
is_deterministic()[source]
class apache_beam.coders.coders.SingletonCoder(value)[source]

Bases: apache_beam.coders.coders.FastCoder

A coder that always encodes exactly one value.

is_deterministic()[source]
class apache_beam.coders.coders.StrUtf8Coder[source]

Bases: apache_beam.coders.coders.Coder

A coder used for reading and writing strings as UTF-8.

decode(value)[source]
encode(value)[source]
is_deterministic()[source]
class apache_beam.coders.coders.TimestampCoder[source]

Bases: apache_beam.coders.coders.FastCoder

A coder used for timeutil.Timestamp values.

is_deterministic()[source]
class apache_beam.coders.coders.TupleCoder(components)[source]

Bases: apache_beam.coders.coders.FastCoder

Coder of tuple objects.

as_cloud_object()[source]
coders()[source]
static from_type_hint(typehint, registry)[source]
is_deterministic()[source]
is_kv_coder()[source]
key_coder()[source]
value_coder()[source]
class apache_beam.coders.coders.TupleSequenceCoder(elem_coder)[source]

Bases: apache_beam.coders.coders.FastCoder

Coder of homogeneous tuple objects.

static from_type_hint(typehint, registry)[source]
is_deterministic()[source]
class apache_beam.coders.coders.VarIntCoder[source]

Bases: apache_beam.coders.coders.FastCoder

Variable-length integer coder.

is_deterministic()[source]
class apache_beam.coders.coders.WindowedValueCoder(wrapped_value_coder, window_coder=None)[source]

Bases: apache_beam.coders.coders.FastCoder

Coder for windowed values.

as_cloud_object()[source]
is_deterministic()[source]
is_kv_coder()[source]
key_coder()[source]
value_coder()[source]

apache_beam.coders.coders_test_common module

Tests common to all coder implementations.

class apache_beam.coders.coders_test_common.CodersTest(methodName='runTest')[source]

Bases: unittest.case.TestCase

check_coder(coder, *values)[source]
classmethod setUpClass()[source]
classmethod tearDownClass()[source]
test_base64_pickle_coder()[source]
test_bytes_coder()[source]
test_custom_coder()[source]
test_deterministic_coder()[source]
test_dill_coder()[source]
test_fast_primitives_coder()[source]
test_float_coder()[source]
test_global_window_coder()[source]
test_interval_window_coder()[source]
test_iterable_coder()[source]
test_iterable_coder_unknown_length()[source]
test_length_prefix_coder()[source]
test_nested_observables()[source]
test_pickle_coder()[source]
test_proto_coder()[source]
test_singleton_coder()[source]
test_timestamp_coder()[source]
test_tuple_coder()[source]
test_tuple_sequence_coder()[source]
test_utf8_coder()[source]
test_varint_coder()[source]
test_windowed_value_coder()[source]
class apache_beam.coders.coders_test_common.CustomCoder[source]

Bases: apache_beam.coders.coders.Coder

decode(encoded)[source]
encode(x)[source]

apache_beam.coders.observable module

Observable base class for iterables.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.coders.observable.ObservableMixin[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

An observable iterable.

Subclasses need to call self.notify_observers with any object yielded.

notify_observers(value, **kwargs)[source]
register_observer(callback)[source]

apache_beam.coders.proto2_coder_test_messages_pb2 module

class apache_beam.coders.proto2_coder_test_messages_pb2.MessageA(**kwargs)

Bases: google.protobuf.message.Message

ByteSize()
Clear()
ClearField(field_name)
DESCRIPTOR = <google.protobuf.descriptor.Descriptor object>
DiscardUnknownFields()
FIELD1_FIELD_NUMBER = 1
FIELD2_FIELD_NUMBER = 2
FindInitializationErrors()

Finds required fields which are not initialized.

Returns:A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
static FromString(s)
HasField(field_name)
IsInitialized(errors=None)

Checks if all required fields of a message are set.

Parameters:errors – A list which, if provided, will be populated with the field paths of all missing required fields.
Returns:True iff the specified message has all required fields set.
ListFields()
MergeFrom(msg)
MergeFromString(serialized)
static RegisterExtension(extension_handle)
SerializePartialToString()
SerializeToString()
SetInParent()

Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.

WhichOneof(oneof_name)

Returns the name of the currently set field inside a oneof, or None.

field1

Magic attribute generated for “field1” proto field.

field2

Magic attribute generated for “field2” proto field.

class apache_beam.coders.proto2_coder_test_messages_pb2.MessageB(**kwargs)

Bases: google.protobuf.message.Message

ByteSize()
Clear()
ClearField(field_name)
DESCRIPTOR = <google.protobuf.descriptor.Descriptor object>
DiscardUnknownFields()
FIELD1_FIELD_NUMBER = 1
FindInitializationErrors()

Finds required fields which are not initialized.

Returns:A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
static FromString(s)
HasField(field_name)
IsInitialized(errors=None)

Checks if all required fields of a message are set.

Parameters:errors – A list which, if provided, will be populated with the field paths of all missing required fields.
Returns:True iff the specified message has all required fields set.
ListFields()
MergeFrom(msg)
MergeFromString(serialized)
static RegisterExtension(extension_handle)
SerializePartialToString()
SerializeToString()
SetInParent()

Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.

WhichOneof(oneof_name)

Returns the name of the currently set field inside a oneof, or None.

field1

Magic attribute generated for “field1” proto field.

class apache_beam.coders.proto2_coder_test_messages_pb2.MessageC(**kwargs)

Bases: google.protobuf.message.Message

ByteSize()
Clear()
ClearExtension(extension_handle)
ClearField(field_name)
DESCRIPTOR = <google.protobuf.descriptor.Descriptor object>
DiscardUnknownFields()
Extensions
FindInitializationErrors()

Finds required fields which are not initialized.

Returns:A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
static FromString(s)
HasExtension(extension_handle)
HasField(field_name)
IsInitialized(errors=None)

Checks if all required fields of a message are set.

Parameters:errors – A list which, if provided, will be populated with the field paths of all missing required fields.
Returns:True iff the specified message has all required fields set.
ListFields()
MergeFrom(msg)
MergeFromString(serialized)
static RegisterExtension(extension_handle)
SerializePartialToString()
SerializeToString()
SetInParent()

Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.

WhichOneof(oneof_name)

Returns the name of the currently set field inside a oneof, or None.

class apache_beam.coders.proto2_coder_test_messages_pb2.MessageWithMap(**kwargs)

Bases: google.protobuf.message.Message

ByteSize()
Clear()
ClearField(field_name)
DESCRIPTOR = <google.protobuf.descriptor.Descriptor object>
DiscardUnknownFields()
FIELD1_FIELD_NUMBER = 1
class Field1Entry(**kwargs)

Bases: google.protobuf.message.Message

ByteSize()
Clear()
ClearField(field_name)
DESCRIPTOR = <google.protobuf.descriptor.Descriptor object>
DiscardUnknownFields()
FindInitializationErrors()

Finds required fields which are not initialized.

Returns:A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
static FromString(s)
HasField(field_name)
IsInitialized(errors=None)

Checks if all required fields of a message are set.

Parameters:errors – A list which, if provided, will be populated with the field paths of all missing required fields.
Returns:True iff the specified message has all required fields set.
KEY_FIELD_NUMBER = 1
ListFields()
MergeFrom(msg)
MergeFromString(serialized)
static RegisterExtension(extension_handle)
SerializePartialToString()
SerializeToString()
SetInParent()

Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.

VALUE_FIELD_NUMBER = 2
WhichOneof(oneof_name)

Returns the name of the currently set field inside a oneof, or None.

key

Magic attribute generated for “key” proto field.

value

Magic attribute generated for “value” proto field.

FindInitializationErrors()

Finds required fields which are not initialized.

Returns:A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
static FromString(s)
HasField(field_name)
IsInitialized(errors=None)

Checks if all required fields of a message are set.

Parameters:errors – A list which, if provided, will be populated with the field paths of all missing required fields.
Returns:True iff the specified message has all required fields set.
ListFields()
MergeFrom(msg)
MergeFromString(serialized)
static RegisterExtension(extension_handle)
SerializePartialToString()
SerializeToString()
SetInParent()

Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.

WhichOneof(oneof_name)

Returns the name of the currently set field inside a oneof, or None.

field1

Magic attribute generated for “field1” proto field.

class apache_beam.coders.proto2_coder_test_messages_pb2.ReferencesMessageWithMap(**kwargs)

Bases: google.protobuf.message.Message

ByteSize()
Clear()
ClearField(field_name)
DESCRIPTOR = <google.protobuf.descriptor.Descriptor object>
DiscardUnknownFields()
FIELD1_FIELD_NUMBER = 1
FindInitializationErrors()

Finds required fields which are not initialized.

Returns:A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
static FromString(s)
HasField(field_name)
IsInitialized(errors=None)

Checks if all required fields of a message are set.

Parameters:errors – A list which, if provided, will be populated with the field paths of all missing required fields.
Returns:True iff the specified message has all required fields set.
ListFields()
MergeFrom(msg)
MergeFromString(serialized)
static RegisterExtension(extension_handle)
SerializePartialToString()
SerializeToString()
SetInParent()

Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.

WhichOneof(oneof_name)

Returns the name of the currently set field inside a oneof, or None.

field1

Magic attribute generated for “field1” proto field.

apache_beam.coders.slow_stream module

A pure Python implementation of stream.pyx.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.coders.slow_stream.ByteCountingOutputStream[source]

Bases: apache_beam.coders.slow_stream.OutputStream

For internal use only; no backwards-compatibility guarantees.

A pure Python implementation of stream.ByteCountingOutputStream.

get()[source]
get_count()[source]
write(byte_array, nested=False)[source]
write_byte(_)[source]
class apache_beam.coders.slow_stream.InputStream(data)[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

A pure Python implementation of stream.InputStream.

read(size)[source]
read_all(nested)[source]
read_bigendian_double()[source]
read_bigendian_int32()[source]
read_bigendian_int64()[source]
read_bigendian_uint64()[source]
read_byte()[source]
read_var_int64()[source]
size()[source]
class apache_beam.coders.slow_stream.OutputStream[source]

Bases: object

For internal use only; no backwards-compatibility guarantees.

A pure Python implementation of stream.OutputStream.

get()[source]
size()[source]
write(b, nested=False)[source]
write_bigendian_double(v)[source]
write_bigendian_int32(v)[source]
write_bigendian_int64(v)[source]
write_bigendian_uint64(v)[source]
write_byte(val)[source]
write_var_int64(v)[source]
apache_beam.coders.slow_stream.get_varint_size(v)[source]

For internal use only; no backwards-compatibility guarantees.

Returns the size of the given integer value when encode as a VarInt.

apache_beam.coders.stream module

members:
undoc-members:
show-inheritance:
 

apache_beam.coders.typecoders module

Type coders registration.

This module contains functionality to define and use coders for custom classes. Let’s say we have a class Xyz and we are processing a PCollection with elements of type Xyz. If we do not register a coder for Xyz, a default pickle-based fallback coder will be used. This can be undesirable for two reasons. First, we may want a faster coder or a more space efficient one. Second, the pickle-based coder is not deterministic in the sense that objects like dictionaries or sets are not guaranteed to be encoded in the same way every time (elements are not really ordered).

Two (sometimes three) steps are needed to define and use a custom coder:
  • define the coder class
  • associate the code with the class (a.k.a. coder registration)
  • typehint DoFns or transforms with the new class or composite types using the class.

A coder class is defined by subclassing from CoderBase and defining the encode_to_bytes and decode_from_bytes methods. The framework uses duck-typing for coders so it is not strictly required to subclass from CoderBase as long as the encode/decode methods are defined.

Registering a coder class is made with a register_coder() call:

from apache_beam import coders
...
coders.registry.register_coder(Xyz, XyzCoder)

Additionally, DoFns and PTransforms may need type hints. This is not always necessary since there is functionality to infer the return types of DoFns by analyzing the code. For instance, for the function below the return type of ‘Xyz’ will be inferred:

def MakeXyzs(v):
  return Xyz(v)

If Xyz is inferred then its coder will be used whenever the framework needs to serialize data (e.g., writing to the shuffler subsystem responsible for group by key operations). If a typehint is needed it can be specified by decorating the DoFns or using with_input_types/with_output_types methods on PTransforms. For example, the above function can be decorated:

@with_output_types(Xyz)
def MakeXyzs(v):
  return complex_operation_returning_Xyz(v)

See apache_beam.typehints.decorators module for more details.

Module contents