Class ProtoCoder<T extends Message>

java.lang.Object
org.apache.beam.sdk.coders.Coder<T>
org.apache.beam.sdk.coders.CustomCoder<T>
org.apache.beam.sdk.extensions.protobuf.ProtoCoder<T>
Type Parameters:
T - the Protocol Buffers Message handled by this Coder.
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
DynamicProtoCoder

public class ProtoCoder<T extends Message> extends CustomCoder<T>
A Coder using Google Protocol Buffers binary format. ProtoCoder supports both Protocol Buffers syntax versions 2 and 3.

To learn more about Protocol Buffers, visit: https://developers.google.com/protocol-buffers

ProtoCoder is registered in the global CoderRegistry as the default Coder for any Message object. Custom message extensions are also supported, but these extensions must be registered for a particular ProtoCoder instance and that instance must be registered on the PCollection that needs the extensions:


 import MyProtoFile;
 import MyProtoFile.MyMessage;

 Coder<MyMessage> coder = ProtoCoder.of(MyMessage.class).withExtensionsFrom(MyProtoFile.class);
 PCollection<MyMessage> records = input.apply(...).setCoder(coder);
 

Versioning

ProtoCoder supports both versions 2 and 3 of the Protocol Buffers syntax. However, the Java runtime version of the google.com.protobuf library must match exactly the version of protoc that was used to produce the JAR files containing the compiled .proto messages.

For more information, see the Protocol Buffers documentation.

ProtoCoder and Determinism

In general, Protocol Buffers messages can be encoded deterministically within a single pipeline as long as:

  • The encoded messages (and any transitively linked messages) do not use map fields.
  • Every Java VM that encodes or decodes the messages use the same runtime version of the Protocol Buffers library and the same compiled .proto file JAR.

ProtoCoder and Encoding Stability

When changing Protocol Buffers messages, follow the rules in the Protocol Buffers language guides for proto2 and proto3 syntaxes, depending on your message type. Following these guidelines will ensure that the old encoded data can be read by new versions of the code.

Generally, any change to the message type, registered extensions, runtime library, or compiled proto JARs may change the encoding. Thus even if both the original and updated messages can be encoded deterministically within a single job, these deterministic encodings may not be the same across jobs.

See Also:
  • Field Details

  • Constructor Details

    • ProtoCoder

      protected ProtoCoder(Class<T> protoMessageClass, Set<Class<?>> extensionHostClasses)
      Private constructor.
  • Method Details

    • of

      public static <T extends Message> ProtoCoder<T> of(Class<T> protoMessageClass)
      Returns a ProtoCoder for the given Protocol Buffers Message.
    • of

      public static <T extends Message> ProtoCoder<T> of(TypeDescriptor<T> protoMessageType)
      Returns a ProtoCoder for the Protocol Buffers Message indicated by the given TypeDescriptor.
    • withExtensionsFrom

      public ProtoCoder<T> withExtensionsFrom(Iterable<Class<?>> moreExtensionHosts)
      Returns a ProtoCoder like this one, but with the extensions from the given classes registered.

      Each of the extension host classes must be an class automatically generated by the Protocol Buffers compiler, protoc, that contains messages.

      Does not modify this object.

    • withExtensionsFrom

      public ProtoCoder<T> withExtensionsFrom(Class<?>... moreExtensionHosts)
      See withExtensionsFrom(Iterable).

      Does not modify this object.

    • encode

      public void encode(T value, OutputStream outStream) throws IOException
      Description copied from class: Coder
      Encodes the given value of type T onto the given output stream. Multiple elements can be encoded next to each other on the output stream, each coder should encode information to know how many bytes to read when decoding. A common approach is to prefix the encoding with the element's encoded length.
      Specified by:
      encode in class Coder<T extends Message>
      Throws:
      IOException - if writing to the OutputStream fails for some reason
    • encode

      public void encode(T value, OutputStream outStream, Coder.Context context) throws IOException
      Description copied from class: Coder
      Encodes the given value of type T onto the given output stream in the given context.
      Overrides:
      encode in class Coder<T extends Message>
      Throws:
      IOException - if writing to the OutputStream fails for some reason
    • decode

      public T decode(InputStream inStream) throws IOException
      Description copied from class: Coder
      Decodes a value of type T from the given input stream in the given context. Returns the decoded value. Multiple elements can be encoded next to each other on the input stream, each coder should encode information to know how many bytes to read when decoding. A common approach is to prefix the encoding with the element's encoded length.
      Specified by:
      decode in class Coder<T extends Message>
      Throws:
      IOException - if reading from the InputStream fails for some reason
    • decode

      public T decode(InputStream inStream, Coder.Context context) throws IOException
      Description copied from class: Coder
      Decodes a value of type T from the given input stream in the given context. Returns the decoded value.
      Overrides:
      decode in class Coder<T extends Message>
      Throws:
      IOException - if reading from the InputStream fails for some reason
    • equals

      public boolean equals(@Nullable Object other)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • verifyDeterministic

      public void verifyDeterministic() throws Coder.NonDeterministicException
      Description copied from class: CustomCoder
      Throw Coder.NonDeterministicException if the coding is not deterministic.

      In order for a Coder to be considered deterministic, the following must be true:

      • two values that compare as equal (via Object.equals() or Comparable.compareTo(), if supported) have the same encoding.
      • the Coder always produces a canonical encoding, which is the same for an instance of an object even if produced on different computers at different times.
      Overrides:
      verifyDeterministic in class CustomCoder<T extends Message>
      Throws:
      Coder.NonDeterministicException - a CustomCoder is presumed nondeterministic.
    • getMessageType

      public Class<T> getMessageType()
      Returns the Protocol Buffers Message type this ProtoCoder supports.
    • getEncodedTypeDescriptor

      public TypeDescriptor<T> getEncodedTypeDescriptor()
      Description copied from class: Coder
      Returns the TypeDescriptor for the type encoded.
      Overrides:
      getEncodedTypeDescriptor in class Coder<T extends Message>
    • getExtensionHosts

      public Set<Class<?>> getExtensionHosts()
    • getExtensionRegistry

      public ExtensionRegistry getExtensionRegistry()
      Returns the ExtensionRegistry listing all known Protocol Buffers extension messages to T registered with this ProtoCoder.
    • getParser

      protected Parser<T> getParser()
      Get the memoized Parser, possibly initializing it lazily.
    • getCoderProvider

      public static CoderProvider getCoderProvider()
      Returns a CoderProvider which uses the ProtoCoder for proto messages.

      This method is invoked reflectively from DefaultCoder.