apache_beam.coders.coders module

Collection of useful coders.

Only those coders listed in __all__ are part of the public API of this module.

## On usage of pickle, dill and pickler in Beam

In Beam, we generally we use pickle for pipeline elements and dill for more complex types, like user functions.

pickler is Beam’s own wrapping of dill + compression + error handling. It serves also as an API to mask the actual encoding layer (so we can change it from dill if necessary).

We created _MemoizingPickleCoder to improve performance when serializing complex user types for the execution of SDF. Specifically to address BEAM-12781, where many identical BoundedSource instances are being encoded.

class apache_beam.coders.coders.Coder[source]

Bases: object

Base class for coders.

encode(value)[source]

Encodes the given object into a byte string.

decode(encoded)[source]

Decodes the given byte string into the corresponding object.

encode_nested(value)[source]

Uses the underlying implementation to encode in nested format.

decode_nested(encoded)[source]

Uses the underlying implementation to decode in nested format.

is_deterministic()[source]

Whether this coder is guaranteed to encode values deterministically.

A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.

For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.

Returns:Whether coder is deterministic.
as_deterministic_coder(step_label, error_message=None)[source]

Returns a deterministic version of self, if possible.

Otherwise raises a value error.

estimate_size(value)[source]

Estimates the encoded size of the given value, in bytes.

Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection.

The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance.

Parameters:value – the value whose encoded size is to be estimated.
Returns:The estimated encoded size of the given value.
get_impl()[source]

For internal use only; no backwards-compatibility guarantees.

Returns the CoderImpl backing this Coder.

to_type_hint()[source]
classmethod from_type_hint(unused_typehint, unused_registry)[source]
is_kv_coder()[source]
key_coder()[source]
value_coder()[source]
as_cloud_object(coders_context=None)[source]

For internal use only; no backwards-compatibility guarantees.

Returns Google Cloud Dataflow API description of this coder.

classmethod register_urn(urn, parameter_type, fn=None)[source]

Registers a urn with a constructor.

For example, if ‘beam:fn:foo’ had parameter type FooPayload, one could write RunnerApiFn.register_urn(‘bean:fn:foo’, FooPayload, foo_from_proto) where foo_from_proto took as arguments a FooPayload and a PipelineContext. This function can also be used as a decorator rather than passing the callable in as the final parameter.

A corresponding to_runner_api_parameter method would be expected that returns the tuple (‘beam:fn:foo’, FooPayload)

to_runner_api(context)[source]
classmethod from_runner_api(coder_proto, context)[source]

Converts from an FunctionSpec to a Fn object.

Prefer registering a urn with its parameter type and constructor.

to_runner_api_parameter(context)[source]
static register_structured_urn(urn, cls)[source]

Register a coder that’s completely defined by its urn and its component(s), if any, which are passed to construct the instance.

class apache_beam.coders.coders.StrUtf8Coder[source]

Bases: apache_beam.coders.coders.Coder

A coder used for reading and writing strings as UTF-8.

encode(value)[source]
decode(value)[source]
is_deterministic()[source]
to_type_hint()[source]
to_runner_api_parameter(unused_context)
class apache_beam.coders.coders.BytesCoder[source]

Bases: apache_beam.coders.coders.FastCoder

Byte string coder.

is_deterministic()[source]
to_type_hint()[source]
as_cloud_object(coders_context=None)[source]
to_runner_api_parameter(unused_context)
class apache_beam.coders.coders.BooleanCoder[source]

Bases: apache_beam.coders.coders.FastCoder

is_deterministic()[source]
to_type_hint()[source]
to_runner_api_parameter(unused_context)
class apache_beam.coders.coders.MapCoder(key_coder, value_coder)[source]

Bases: apache_beam.coders.coders.FastCoder

to_type_hint()[source]
is_deterministic()[source]
class apache_beam.coders.coders.NullableCoder(value_coder)[source]

Bases: apache_beam.coders.coders.FastCoder

to_type_hint()[source]
is_deterministic()[source]
class apache_beam.coders.coders.VarIntCoder[source]

Bases: apache_beam.coders.coders.FastCoder

Variable-length integer coder.

is_deterministic()[source]
to_type_hint()[source]
as_cloud_object(coders_context=None)[source]
to_runner_api_parameter(unused_context)
class apache_beam.coders.coders.FloatCoder[source]

Bases: apache_beam.coders.coders.FastCoder

A coder used for floating-point values.

is_deterministic()[source]
to_type_hint()[source]
to_runner_api_parameter(unused_context)
class apache_beam.coders.coders.TimestampCoder[source]

Bases: apache_beam.coders.coders.FastCoder

A coder used for timeutil.Timestamp values.

is_deterministic()[source]
class apache_beam.coders.coders.SingletonCoder(value)[source]

Bases: apache_beam.coders.coders.FastCoder

A coder that always encodes exactly one value.

is_deterministic()[source]
class apache_beam.coders.coders.PickleCoder[source]

Bases: apache_beam.coders.coders._PickleCoderBase

Coder using Python’s pickle functionality.

as_deterministic_coder(step_label, error_message=None)[source]
to_type_hint()[source]
class apache_beam.coders.coders.DillCoder[source]

Bases: apache_beam.coders.coders._PickleCoderBase

Coder using dill’s pickle functionality.

class apache_beam.coders.coders.FastPrimitivesCoder(fallback_coder=PickleCoder)[source]

Bases: apache_beam.coders.coders.FastCoder

Encodes simple primitives (e.g. str, int) efficiently.

For unknown types, falls back to another coder (e.g. PickleCoder).

is_deterministic()[source]
as_deterministic_coder(step_label, error_message=None)[source]
to_type_hint()[source]
as_cloud_object(coders_context=None, is_pair_like=True)[source]
is_kv_coder()[source]
key_coder()[source]
value_coder()[source]
class apache_beam.coders.coders.ProtoCoder(proto_message_type)[source]

Bases: apache_beam.coders.coders.FastCoder

A Coder for Google Protocol Buffers.

It supports both Protocol Buffers syntax versions 2 and 3. However, the runtime version of the python protobuf library must exactly match the version of the protoc compiler what was used to generate the protobuf messages.

ProtoCoder is registered in the global CoderRegistry as the default coder for any protobuf Message object.

is_deterministic()[source]
as_deterministic_coder(step_label, error_message=None)[source]
classmethod from_type_hint(typehint, unused_registry)[source]
to_type_hint()[source]
class apache_beam.coders.coders.AvroGenericCoder(schema)[source]

Bases: apache_beam.coders.coders.FastCoder

A coder used for AvroRecord values.

is_deterministic()[source]
to_type_hint()[source]
to_runner_api_parameter(context)[source]
static from_runner_api_parameter(payload, unused_components, unused_context)[source]
class apache_beam.coders.coders.TupleCoder(components)[source]

Bases: apache_beam.coders.coders.FastCoder

Coder of tuple objects.

is_deterministic()[source]
as_deterministic_coder(step_label, error_message=None)[source]
to_type_hint()[source]
classmethod from_type_hint(typehint, registry)[source]
as_cloud_object(coders_context=None)[source]
coders()[source]
is_kv_coder()[source]
key_coder()[source]
value_coder()[source]
to_runner_api_parameter(context)[source]
static from_runner_api_parameter(unused_payload, components, unused_context)[source]
class apache_beam.coders.coders.TupleSequenceCoder(elem_coder)[source]

Bases: apache_beam.coders.coders.FastCoder

Coder of homogeneous tuple objects.

value_coder()[source]
is_deterministic()[source]
as_deterministic_coder(step_label, error_message=None)[source]
classmethod from_type_hint(typehint, registry)[source]
class apache_beam.coders.coders.IterableCoder(elem_coder)[source]

Bases: apache_beam.coders.coders.ListLikeCoder

Coder of iterables of homogeneous objects.

to_type_hint()[source]
to_runner_api_parameter(unused_context)
class apache_beam.coders.coders.ListCoder(elem_coder)[source]

Bases: apache_beam.coders.coders.ListLikeCoder

Coder of Python lists.

to_type_hint()[source]
class apache_beam.coders.coders.WindowedValueCoder(wrapped_value_coder, window_coder=None)[source]

Bases: apache_beam.coders.coders.FastCoder

Coder for windowed values.

is_deterministic()[source]
as_cloud_object(coders_context=None)[source]
is_kv_coder()[source]
key_coder()[source]
value_coder()[source]
to_runner_api_parameter(unused_context)
class apache_beam.coders.coders.ParamWindowedValueCoder(payload, components)[source]

Bases: apache_beam.coders.coders.WindowedValueCoder

A coder used for parameterized windowed values.

is_deterministic()[source]
as_cloud_object(coders_context=None)[source]
static from_runner_api_parameter(payload, components, unused_context)[source]
to_runner_api_parameter(context)[source]
class apache_beam.coders.coders.ShardedKeyCoder(key_coder)[source]

Bases: apache_beam.coders.coders.FastCoder

A coder for sharded key.

is_deterministic()[source]
as_cloud_object(coders_context=None)[source]
to_type_hint()[source]
classmethod from_type_hint(typehint, registry)[source]
to_runner_api_parameter(unused_context)