Type coders registration.
This module contains functionality to define and use coders for custom classes. Let’s say we have a class Xyz and we are processing a PCollection with elements of type Xyz. If we do not register a coder for Xyz, a default pickle-based fallback coder will be used. This can be undesirable for two reasons. First, we may want a faster coder or a more space efficient one. Second, the pickle-based coder is not deterministic in the sense that objects like dictionaries or sets are not guaranteed to be encoded in the same way every time (elements are not really ordered).
- Two (sometimes three) steps are needed to define and use a custom coder:
- define the coder class
- associate the code with the class (a.k.a. coder registration)
- typehint DoFns or transforms with the new class or composite types using the class.
A coder class is defined by subclassing from CoderBase and defining the encode_to_bytes and decode_from_bytes methods. The framework uses duck-typing for coders so it is not strictly required to subclass from CoderBase as long as the encode/decode methods are defined.
Registering a coder class is made with a register_coder() call:
from apache_beam import coders
Additionally, DoFns and PTransforms may need type hints. This is not always necessary since there is functionality to infer the return types of DoFns by analyzing the code. For instance, for the function below the return type of ‘Xyz’ will be inferred:
If Xyz is inferred then its coder will be used whenever the framework needs to serialize data (e.g., writing to the shuffler subsystem responsible for group by key operations). If a typehint is needed it can be specified by decorating the DoFns or using with_input_types/with_output_types methods on PTransforms. For example, the above function can be decorated:
See apache_beam.typehints.decorators module for more details.