apache_beam.coders.coders module¶
Collection of useful coders.
Only those coders listed in __all__ are part of the public API of this module.
## On usage of pickle, dill and pickler in Beam
In Beam, we generally we use pickle for pipeline elements and dill for more complex types, like user functions.
pickler is Beam’s own wrapping of dill + compression + error handling. It serves also as an API to mask the actual encoding layer (so we can change it from dill if necessary).
We created _MemoizingPickleCoder to improve performance when serializing complex user types for the execution of SDF. Specifically to address BEAM-12781, where many identical BoundedSource instances are being encoded.
- 
class apache_beam.coders.coders.Coder[source]¶
- Bases: - object- Base class for coders. - 
is_deterministic()[source]¶
- Whether this coder is guaranteed to encode values deterministically. - A deterministic coder is required for key coders in GroupByKey operations to produce consistent results. - For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently. - Returns: - Whether coder is deterministic. 
 - 
as_deterministic_coder(step_label, error_message=None)[source]¶
- Returns a deterministic version of self, if possible. - Otherwise raises a value error. 
 - 
estimate_size(value)[source]¶
- Estimates the encoded size of the given value, in bytes. - Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection. - The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance. - Parameters: - value – the value whose encoded size is to be estimated. - Returns: - The estimated encoded size of the given value. 
 - 
get_impl()[source]¶
- For internal use only; no backwards-compatibility guarantees. - Returns the CoderImpl backing this Coder. 
 - 
as_cloud_object(coders_context=None)[source]¶
- For internal use only; no backwards-compatibility guarantees. - Returns Google Cloud Dataflow API description of this coder. 
 - 
classmethod register_urn(urn, parameter_type, fn=None)[source]¶
- Registers a urn with a constructor. - For example, if ‘beam:fn:foo’ had parameter type FooPayload, one could write RunnerApiFn.register_urn(‘bean:fn:foo’, FooPayload, foo_from_proto) where foo_from_proto took as arguments a FooPayload and a PipelineContext. This function can also be used as a decorator rather than passing the callable in as the final parameter. - A corresponding to_runner_api_parameter method would be expected that returns the tuple (‘beam:fn:foo’, FooPayload) 
 
- 
- 
class apache_beam.coders.coders.StrUtf8Coder[source]¶
- Bases: - apache_beam.coders.coders.Coder- A coder used for reading and writing strings as UTF-8. - 
to_runner_api_parameter(unused_context)¶
 
- 
- 
class apache_beam.coders.coders.BytesCoder[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- Byte string coder. - 
to_runner_api_parameter(unused_context)¶
 
- 
- 
class apache_beam.coders.coders.BooleanCoder[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- 
to_runner_api_parameter(unused_context)¶
 
- 
- 
class apache_beam.coders.coders.MapCoder(key_coder, value_coder)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder
- 
class apache_beam.coders.coders.NullableCoder(value_coder)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- 
to_runner_api_parameter(unused_context)¶
 
- 
- 
class apache_beam.coders.coders.VarIntCoder[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- Variable-length integer coder. - 
to_runner_api_parameter(unused_context)¶
 
- 
- 
class apache_beam.coders.coders.FloatCoder[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- A coder used for floating-point values. - 
to_runner_api_parameter(unused_context)¶
 
- 
- 
class apache_beam.coders.coders.TimestampCoder[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- A coder used for timeutil.Timestamp values. 
- 
class apache_beam.coders.coders.SingletonCoder(value)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- A coder that always encodes exactly one value. 
- 
class apache_beam.coders.coders.PickleCoder[source]¶
- Bases: - apache_beam.coders.coders._PickleCoderBase- Coder using Python’s pickle functionality. 
- 
class apache_beam.coders.coders.DillCoder[source]¶
- Bases: - apache_beam.coders.coders._PickleCoderBase- Coder using dill’s pickle functionality. 
- 
class apache_beam.coders.coders.FastPrimitivesCoder(fallback_coder=PickleCoder)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- Encodes simple primitives (e.g. str, int) efficiently. - For unknown types, falls back to another coder (e.g. PickleCoder). 
- 
class apache_beam.coders.coders.ProtoCoder(proto_message_type)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- A Coder for Google Protocol Buffers. - It supports both Protocol Buffers syntax versions 2 and 3. However, the runtime version of the python protobuf library must exactly match the version of the protoc compiler what was used to generate the protobuf messages. - ProtoCoder is registered in the global CoderRegistry as the default coder for any protobuf Message object. 
- 
class apache_beam.coders.coders.ProtoPlusCoder(proto_plus_message_type)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- A Coder for Google Protocol Buffers wrapped using the proto-plus library. - ProtoPlusCoder is registered in the global CoderRegistry as the default coder for any proto.Message object. 
- 
class apache_beam.coders.coders.AvroGenericCoder(schema)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- A coder used for AvroRecord values. 
- 
class apache_beam.coders.coders.TupleCoder(components)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- Coder of tuple objects. 
- 
class apache_beam.coders.coders.TupleSequenceCoder(elem_coder)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- Coder of homogeneous tuple objects. 
- 
class apache_beam.coders.coders.IterableCoder(elem_coder)[source]¶
- Bases: - apache_beam.coders.coders.ListLikeCoder- Coder of iterables of homogeneous objects. - 
to_runner_api_parameter(unused_context)¶
 
- 
- 
class apache_beam.coders.coders.ListCoder(elem_coder)[source]¶
- Bases: - apache_beam.coders.coders.ListLikeCoder- Coder of Python lists. 
- 
class apache_beam.coders.coders.WindowedValueCoder(wrapped_value_coder, window_coder=None)[source]¶
- Bases: - apache_beam.coders.coders.FastCoder- Coder for windowed values. - 
to_runner_api_parameter(unused_context)¶
 
- 
- 
class apache_beam.coders.coders.ParamWindowedValueCoder(payload, components)[source]¶
- Bases: - apache_beam.coders.coders.WindowedValueCoder- A coder used for parameterized windowed values.