apache_beam.coders package¶
Submodules¶
apache_beam.coders.coder_impl module¶
Coder implementations.
The actual encode/decode implementations are split off from coders to allow conditional (compiled/pure) implementations, which can be used to encode many elements with minimal overhead.
This module may be optionally compiled with Cython, using the corresponding coder_impl.pxd file for type hints.
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.coders.coder_impl.
AbstractComponentCoderImpl
(coder_impls)[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
CoderImpl for coders that are comprised of several component coders.
-
class
apache_beam.coders.coder_impl.
BytesCoderImpl
[source]¶ Bases:
apache_beam.coders.coder_impl.CoderImpl
For internal use only; no backwards-compatibility guarantees.
A coder for bytes/str objects.
-
class
apache_beam.coders.coder_impl.
CallbackCoderImpl
(encoder, decoder, size_estimator=None)[source]¶ Bases:
apache_beam.coders.coder_impl.CoderImpl
For internal use only; no backwards-compatibility guarantees.
A CoderImpl that calls back to the _impl methods on the Coder itself.
This is the default implementation used if Coder._get_impl() is not overwritten.
-
class
apache_beam.coders.coder_impl.
CoderImpl
[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
-
decode_from_stream
(stream, nested)[source]¶ Reads object from potentially-nested encoding in stream.
-
encode_to_stream
(value, stream, nested)[source]¶ Reads object from potentially-nested encoding in stream.
-
estimate_size
(value, nested=False)[source]¶ Estimates the encoded size of the given value, in bytes.
-
get_estimated_size_and_observables
(value, nested=False)[source]¶ Returns estimated size of value along with any nested observables.
The list of nested observables is returned as a list of 2-tuples of (obj, coder_impl), where obj is an instance of observable.ObservableMixin, and coder_impl is the CoderImpl that can be used to encode elements sent by obj to its observers.
Parameters: - value – the value whose encoded size is to be estimated.
- nested – whether the value is nested.
Returns: The estimated encoded size of the given value and a list of observables whose elements are 2-tuples of (obj, coder_impl) as described above.
-
-
class
apache_beam.coders.coder_impl.
DeterministicFastPrimitivesCoderImpl
(coder, step_label)[source]¶ Bases:
apache_beam.coders.coder_impl.CoderImpl
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.coders.coder_impl.
FastPrimitivesCoderImpl
(fallback_coder_impl)[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.coders.coder_impl.
FloatCoderImpl
[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.coders.coder_impl.
IntervalWindowCoderImpl
[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.coders.coder_impl.
IterableCoderImpl
(elem_coder)[source]¶ Bases:
apache_beam.coders.coder_impl.SequenceCoderImpl
For internal use only; no backwards-compatibility guarantees.
A coder for homogeneous iterable objects.
-
class
apache_beam.coders.coder_impl.
LengthPrefixCoderImpl
(value_coder)[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
Coder which prefixes the length of the encoded object in the stream.
-
class
apache_beam.coders.coder_impl.
ProtoCoderImpl
(proto_message_type)[source]¶ Bases:
apache_beam.coders.coder_impl.SimpleCoderImpl
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.coders.coder_impl.
SequenceCoderImpl
(elem_coder)[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
A coder for sequences.
If the length of the sequence in known we encode the length as a 32 bit
int
followed by the encoded bytes.If the length of the sequence is unknown, we encode the length as
-1
followed by the encoding of elements buffered up to 64K bytes before prefixing the count of number of elements. A0
is encoded at the end to indicate the end of stream.The resulting encoding would look like this:
-1 countA element(0) element(1) ... element(countA - 1) countB element(0) element(1) ... element(countB - 1) ... countX element(0) element(1) ... element(countX - 1) 0
-
class
apache_beam.coders.coder_impl.
SimpleCoderImpl
[source]¶ Bases:
apache_beam.coders.coder_impl.CoderImpl
For internal use only; no backwards-compatibility guarantees.
Subclass of CoderImpl implementing stream methods using encode/decode.
-
class
apache_beam.coders.coder_impl.
SingletonCoderImpl
(value)[source]¶ Bases:
apache_beam.coders.coder_impl.CoderImpl
For internal use only; no backwards-compatibility guarantees.
A coder that always encodes exactly one value.
-
class
apache_beam.coders.coder_impl.
StreamCoderImpl
[source]¶ Bases:
apache_beam.coders.coder_impl.CoderImpl
For internal use only; no backwards-compatibility guarantees.
Subclass of CoderImpl implementing encode/decode using stream methods.
-
class
apache_beam.coders.coder_impl.
TimestampCoderImpl
[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.coders.coder_impl.
TupleCoderImpl
(coder_impls)[source]¶ Bases:
apache_beam.coders.coder_impl.AbstractComponentCoderImpl
A coder for tuple objects.
-
class
apache_beam.coders.coder_impl.
TupleSequenceCoderImpl
(elem_coder)[source]¶ Bases:
apache_beam.coders.coder_impl.SequenceCoderImpl
For internal use only; no backwards-compatibility guarantees.
A coder for homogeneous tuple objects.
-
class
apache_beam.coders.coder_impl.
VarIntCoderImpl
[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
A coder for long/int objects.
-
class
apache_beam.coders.coder_impl.
WindowedValueCoderImpl
(value_coder, timestamp_coder, window_coder)[source]¶ Bases:
apache_beam.coders.coder_impl.StreamCoderImpl
For internal use only; no backwards-compatibility guarantees.
A coder for windowed values.
apache_beam.coders.coders module¶
Collection of useful coders.
Only those coders listed in __all__ are part of the public API of this module.
-
class
apache_beam.coders.coders.
Coder
[source]¶ Bases:
object
Base class for coders.
-
as_cloud_object
()[source]¶ For internal use only; no backwards-compatibility guarantees.
Returns Google Cloud Dataflow API description of this coder.
-
estimate_size
(value)[source]¶ Estimates the encoded size of the given value, in bytes.
Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection.
The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance.
Parameters: value – the value whose encoded size is to be estimated. Returns: The estimated encoded size of the given value.
-
static
from_runner_api
(proto, context)[source]¶ For internal use only; no backwards-compatibility guarantees.
-
get_impl
()[source]¶ For internal use only; no backwards-compatibility guarantees.
Returns the CoderImpl backing this Coder.
-
is_deterministic
()[source]¶ Whether this coder is guaranteed to encode values deterministically.
A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.
For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.
Returns: Whether coder is deterministic.
-
-
class
apache_beam.coders.coders.
BytesCoder
[source]¶ Bases:
apache_beam.coders.coders.FastCoder
Byte string coder.
-
class
apache_beam.coders.coders.
DillCoder
[source]¶ Bases:
apache_beam.coders.coders._PickleCoderBase
Coder using dill’s pickle functionality.
-
class
apache_beam.coders.coders.
FastPrimitivesCoder
(fallback_coder=PickleCoder)[source]¶ Bases:
apache_beam.coders.coders.FastCoder
Encodes simple primitives (e.g. str, int) efficiently.
For unknown types, falls back to another coder (e.g. PickleCoder).
-
class
apache_beam.coders.coders.
FloatCoder
[source]¶ Bases:
apache_beam.coders.coders.FastCoder
A coder used for floating-point values.
-
class
apache_beam.coders.coders.
IterableCoder
(elem_coder)[source]¶ Bases:
apache_beam.coders.coders.FastCoder
Coder of iterables of homogeneous objects.
-
class
apache_beam.coders.coders.
PickleCoder
[source]¶ Bases:
apache_beam.coders.coders._PickleCoderBase
Coder using Python’s pickle functionality.
-
class
apache_beam.coders.coders.
ProtoCoder
(proto_message_type)[source]¶ Bases:
apache_beam.coders.coders.FastCoder
A Coder for Google Protocol Buffers.
It supports both Protocol Buffers syntax versions 2 and 3. However, the runtime version of the python protobuf library must exactly match the version of the protoc compiler what was used to generate the protobuf messages.
ProtoCoder is registered in the global CoderRegistry as the default coder for any protobuf Message object.
-
class
apache_beam.coders.coders.
SingletonCoder
(value)[source]¶ Bases:
apache_beam.coders.coders.FastCoder
A coder that always encodes exactly one value.
-
class
apache_beam.coders.coders.
StrUtf8Coder
[source]¶ Bases:
apache_beam.coders.coders.Coder
A coder used for reading and writing strings as UTF-8.
-
class
apache_beam.coders.coders.
TimestampCoder
[source]¶ Bases:
apache_beam.coders.coders.FastCoder
A coder used for timeutil.Timestamp values.
-
class
apache_beam.coders.coders.
TupleCoder
(components)[source]¶ Bases:
apache_beam.coders.coders.FastCoder
Coder of tuple objects.
-
class
apache_beam.coders.coders.
TupleSequenceCoder
(elem_coder)[source]¶ Bases:
apache_beam.coders.coders.FastCoder
Coder of homogeneous tuple objects.
apache_beam.coders.coders_test_common module¶
Tests common to all coder implementations.
apache_beam.coders.observable module¶
Observable base class for iterables.
For internal use only; no backwards-compatibility guarantees.
apache_beam.coders.proto2_coder_test_messages_pb2 module¶
-
class
apache_beam.coders.proto2_coder_test_messages_pb2.
MessageA
(**kwargs)¶ Bases:
google.protobuf.message.Message
-
ByteSize
()¶
-
Clear
()¶
-
ClearField
(field_name)¶
-
DESCRIPTOR
= <google.protobuf.descriptor.Descriptor object>¶
-
DiscardUnknownFields
()¶
-
FIELD1_FIELD_NUMBER
= 1¶
-
FIELD2_FIELD_NUMBER
= 2¶
-
FindInitializationErrors
()¶ Finds required fields which are not initialized.
Returns: A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
-
static
FromString
(s)¶
-
HasField
(field_name)¶
-
IsInitialized
(errors=None)¶ Checks if all required fields of a message are set.
Parameters: errors – A list which, if provided, will be populated with the field paths of all missing required fields. Returns: True iff the specified message has all required fields set.
-
ListFields
()¶
-
MergeFrom
(msg)¶
-
MergeFromString
(serialized)¶
-
static
RegisterExtension
(extension_handle)¶
-
SerializePartialToString
()¶
-
SerializeToString
()¶
-
SetInParent
()¶ Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.
-
WhichOneof
(oneof_name)¶ Returns the name of the currently set field inside a oneof, or None.
-
field1
¶ Magic attribute generated for “field1” proto field.
-
field2
¶ Magic attribute generated for “field2” proto field.
-
-
class
apache_beam.coders.proto2_coder_test_messages_pb2.
MessageB
(**kwargs)¶ Bases:
google.protobuf.message.Message
-
ByteSize
()¶
-
Clear
()¶
-
ClearField
(field_name)¶
-
DESCRIPTOR
= <google.protobuf.descriptor.Descriptor object>¶
-
DiscardUnknownFields
()¶
-
FIELD1_FIELD_NUMBER
= 1¶
-
FindInitializationErrors
()¶ Finds required fields which are not initialized.
Returns: A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
-
static
FromString
(s)¶
-
HasField
(field_name)¶
-
IsInitialized
(errors=None)¶ Checks if all required fields of a message are set.
Parameters: errors – A list which, if provided, will be populated with the field paths of all missing required fields. Returns: True iff the specified message has all required fields set.
-
ListFields
()¶
-
MergeFrom
(msg)¶
-
MergeFromString
(serialized)¶
-
static
RegisterExtension
(extension_handle)¶
-
SerializePartialToString
()¶
-
SerializeToString
()¶
-
SetInParent
()¶ Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.
-
WhichOneof
(oneof_name)¶ Returns the name of the currently set field inside a oneof, or None.
-
field1
¶ Magic attribute generated for “field1” proto field.
-
-
class
apache_beam.coders.proto2_coder_test_messages_pb2.
MessageC
(**kwargs)¶ Bases:
google.protobuf.message.Message
-
ByteSize
()¶
-
Clear
()¶
-
ClearExtension
(extension_handle)¶
-
ClearField
(field_name)¶
-
DESCRIPTOR
= <google.protobuf.descriptor.Descriptor object>¶
-
DiscardUnknownFields
()¶
-
Extensions
¶
-
FindInitializationErrors
()¶ Finds required fields which are not initialized.
Returns: A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
-
static
FromString
(s)¶
-
HasExtension
(extension_handle)¶
-
HasField
(field_name)¶
-
IsInitialized
(errors=None)¶ Checks if all required fields of a message are set.
Parameters: errors – A list which, if provided, will be populated with the field paths of all missing required fields. Returns: True iff the specified message has all required fields set.
-
ListFields
()¶
-
MergeFrom
(msg)¶
-
MergeFromString
(serialized)¶
-
static
RegisterExtension
(extension_handle)¶
-
SerializePartialToString
()¶
-
SerializeToString
()¶
-
SetInParent
()¶ Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.
-
WhichOneof
(oneof_name)¶ Returns the name of the currently set field inside a oneof, or None.
-
-
class
apache_beam.coders.proto2_coder_test_messages_pb2.
MessageWithMap
(**kwargs)¶ Bases:
google.protobuf.message.Message
-
ByteSize
()¶
-
Clear
()¶
-
ClearField
(field_name)¶
-
DESCRIPTOR
= <google.protobuf.descriptor.Descriptor object>¶
-
DiscardUnknownFields
()¶
-
FIELD1_FIELD_NUMBER
= 1¶
-
class
Field1Entry
(**kwargs)¶ Bases:
google.protobuf.message.Message
-
ByteSize
()¶
-
Clear
()¶
-
ClearField
(field_name)¶
-
DESCRIPTOR
= <google.protobuf.descriptor.Descriptor object>¶
-
DiscardUnknownFields
()¶
-
FindInitializationErrors
()¶ Finds required fields which are not initialized.
Returns: A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
-
static
FromString
(s)¶
-
HasField
(field_name)¶
-
IsInitialized
(errors=None)¶ Checks if all required fields of a message are set.
Parameters: errors – A list which, if provided, will be populated with the field paths of all missing required fields. Returns: True iff the specified message has all required fields set.
-
KEY_FIELD_NUMBER
= 1¶
-
ListFields
()¶
-
MergeFrom
(msg)¶
-
MergeFromString
(serialized)¶
-
static
RegisterExtension
(extension_handle)¶
-
SerializePartialToString
()¶
-
SerializeToString
()¶
-
SetInParent
()¶ Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.
-
VALUE_FIELD_NUMBER
= 2¶
-
WhichOneof
(oneof_name)¶ Returns the name of the currently set field inside a oneof, or None.
-
key
¶ Magic attribute generated for “key” proto field.
-
value
¶ Magic attribute generated for “value” proto field.
-
-
FindInitializationErrors
()¶ Finds required fields which are not initialized.
Returns: A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
-
static
FromString
(s)¶
-
HasField
(field_name)¶
-
IsInitialized
(errors=None)¶ Checks if all required fields of a message are set.
Parameters: errors – A list which, if provided, will be populated with the field paths of all missing required fields. Returns: True iff the specified message has all required fields set.
-
ListFields
()¶
-
MergeFrom
(msg)¶
-
MergeFromString
(serialized)¶
-
static
RegisterExtension
(extension_handle)¶
-
SerializePartialToString
()¶
-
SerializeToString
()¶
-
SetInParent
()¶ Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.
-
WhichOneof
(oneof_name)¶ Returns the name of the currently set field inside a oneof, or None.
-
field1
¶ Magic attribute generated for “field1” proto field.
-
-
class
apache_beam.coders.proto2_coder_test_messages_pb2.
ReferencesMessageWithMap
(**kwargs)¶ Bases:
google.protobuf.message.Message
-
ByteSize
()¶
-
Clear
()¶
-
ClearField
(field_name)¶
-
DESCRIPTOR
= <google.protobuf.descriptor.Descriptor object>¶
-
DiscardUnknownFields
()¶
-
FIELD1_FIELD_NUMBER
= 1¶
-
FindInitializationErrors
()¶ Finds required fields which are not initialized.
Returns: A list of strings. Each string is a path to an uninitialized field from the top-level message, e.g. “foo.bar[5].baz”.
-
static
FromString
(s)¶
-
HasField
(field_name)¶
-
IsInitialized
(errors=None)¶ Checks if all required fields of a message are set.
Parameters: errors – A list which, if provided, will be populated with the field paths of all missing required fields. Returns: True iff the specified message has all required fields set.
-
ListFields
()¶
-
MergeFrom
(msg)¶
-
MergeFromString
(serialized)¶
-
static
RegisterExtension
(extension_handle)¶
-
SerializePartialToString
()¶
-
SerializeToString
()¶
-
SetInParent
()¶ Sets the _cached_byte_size_dirty bit to true, and propagates this to our listener iff this was a state change.
-
WhichOneof
(oneof_name)¶ Returns the name of the currently set field inside a oneof, or None.
-
field1
¶ Magic attribute generated for “field1” proto field.
-
apache_beam.coders.slow_stream module¶
A pure Python implementation of stream.pyx.
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.coders.slow_stream.
ByteCountingOutputStream
[source]¶ Bases:
apache_beam.coders.slow_stream.OutputStream
For internal use only; no backwards-compatibility guarantees.
A pure Python implementation of stream.ByteCountingOutputStream.
-
class
apache_beam.coders.slow_stream.
InputStream
(data)[source]¶ Bases:
object
For internal use only; no backwards-compatibility guarantees.
A pure Python implementation of stream.InputStream.
apache_beam.coders.stream module¶
members: undoc-members: show-inheritance:
apache_beam.coders.typecoders module¶
Type coders registration.
This module contains functionality to define and use coders for custom classes. Let’s say we have a class Xyz and we are processing a PCollection with elements of type Xyz. If we do not register a coder for Xyz, a default pickle-based fallback coder will be used. This can be undesirable for two reasons. First, we may want a faster coder or a more space efficient one. Second, the pickle-based coder is not deterministic in the sense that objects like dictionaries or sets are not guaranteed to be encoded in the same way every time (elements are not really ordered).
- Two (sometimes three) steps are needed to define and use a custom coder:
- define the coder class
- associate the code with the class (a.k.a. coder registration)
- typehint DoFns or transforms with the new class or composite types using the class.
A coder class is defined by subclassing from CoderBase and defining the encode_to_bytes and decode_from_bytes methods. The framework uses duck-typing for coders so it is not strictly required to subclass from CoderBase as long as the encode/decode methods are defined.
Registering a coder class is made with a register_coder() call:
from apache_beam import coders
...
coders.registry.register_coder(Xyz, XyzCoder)
Additionally, DoFns and PTransforms may need type hints. This is not always necessary since there is functionality to infer the return types of DoFns by analyzing the code. For instance, for the function below the return type of ‘Xyz’ will be inferred:
def MakeXyzs(v):
return Xyz(v)
If Xyz is inferred then its coder will be used whenever the framework needs to serialize data (e.g., writing to the shuffler subsystem responsible for group by key operations). If a typehint is needed it can be specified by decorating the DoFns or using with_input_types/with_output_types methods on PTransforms. For example, the above function can be decorated:
@with_output_types(Xyz)
def MakeXyzs(v):
return complex_operation_returning_Xyz(v)
See apache_beam.typehints.decorators module for more details.