Class ProtoCoder<T extends Message>
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
DynamicProtoCoder
Coder using Google Protocol Buffers binary format. ProtoCoder supports both
Protocol Buffers syntax versions 2 and 3.
To learn more about Protocol Buffers, visit: https://developers.google.com/protocol-buffers
ProtoCoder is registered in the global CoderRegistry as the default Coder for any Message object. Custom message extensions are also supported, but these
extensions must be registered for a particular ProtoCoder instance and that instance must
be registered on the PCollection that needs the extensions:
import MyProtoFile;
import MyProtoFile.MyMessage;
Coder<MyMessage> coder = ProtoCoder.of(MyMessage.class).withExtensionsFrom(MyProtoFile.class);
PCollection<MyMessage> records = input.apply(...).setCoder(coder);
Versioning
ProtoCoder supports both versions 2 and 3 of the Protocol Buffers syntax. However, the
Java runtime version of the google.com.protobuf library must match exactly the
version of protoc that was used to produce the JAR files containing the compiled
.proto messages.
For more information, see the Protocol Buffers documentation.
ProtoCoder and Determinism
In general, Protocol Buffers messages can be encoded deterministically within a single pipeline as long as:
- The encoded messages (and any transitively linked messages) do not use
mapfields. - Every Java VM that encodes or decodes the messages use the same runtime version of the
Protocol Buffers library and the same compiled
.protofile JAR.
ProtoCoder and Encoding Stability
When changing Protocol Buffers messages, follow the rules in the Protocol Buffers language
guides for
proto2 and proto3
syntaxes, depending on your message type. Following these guidelines will ensure that the old
encoded data can be read by new versions of the code.
Generally, any change to the message type, registered extensions, runtime library, or compiled proto JARs may change the encoding. Thus even if both the original and updated messages can be encoded deterministically within a single job, these deterministic encodings may not be the same across jobs.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.coders.Coder
Coder.Context, Coder.NonDeterministicException -
Field Summary
Fields -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedProtoCoder(Class<T> protoMessageClass, Set<Class<?>> extensionHostClasses) Private constructor. -
Method Summary
Modifier and TypeMethodDescriptiondecode(InputStream inStream) Decodes a value of typeTfrom the given input stream in the given context.decode(InputStream inStream, Coder.Context context) Decodes a value of typeTfrom the given input stream in the given context.voidencode(T value, OutputStream outStream) Encodes the given value of typeTonto the given output stream.voidencode(T value, OutputStream outStream, Coder.Context context) Encodes the given value of typeTonto the given output stream in the given context.booleanstatic CoderProviderReturns theTypeDescriptorfor the type encoded.Returns theExtensionRegistrylisting all known Protocol Buffers extension messages toTregistered with thisProtoCoder.Returns the Protocol BuffersMessagetype thisProtoCodersupports.Get the memoizedParser, possibly initializing it lazily.inthashCode()static <T extends Message>
ProtoCoder<T> Returns aProtoCoderfor the given Protocol BuffersMessage.static <T extends Message>
ProtoCoder<T> of(TypeDescriptor<T> protoMessageType) voidThrowCoder.NonDeterministicExceptionif the coding is not deterministic.withExtensionsFrom(Class<?>... moreExtensionHosts) withExtensionsFrom(Iterable<Class<?>> moreExtensionHosts) Returns aProtoCoderlike this one, but with the extensions from the given classes registered.Methods inherited from class org.apache.beam.sdk.coders.CustomCoder
getCoderArgumentsMethods inherited from class org.apache.beam.sdk.coders.Coder
consistentWithEquals, getEncodedElementByteSize, getEncodedElementByteSizeUsingCoder, isRegisterByteSizeObserverCheap, registerByteSizeObserver, structuralValue, verifyDeterministic, verifyDeterministic
-
Field Details
-
serialVersionUID
public static final long serialVersionUID- See Also:
-
-
Constructor Details
-
ProtoCoder
Private constructor.
-
-
Method Details
-
of
Returns aProtoCoderfor the given Protocol BuffersMessage. -
of
-
withExtensionsFrom
Returns aProtoCoderlike this one, but with the extensions from the given classes registered.Each of the extension host classes must be an class automatically generated by the Protocol Buffers compiler,
protoc, that contains messages.Does not modify this object.
-
withExtensionsFrom
SeewithExtensionsFrom(Iterable).Does not modify this object.
-
encode
Description copied from class:CoderEncodes the given value of typeTonto the given output stream. Multiple elements can be encoded next to each other on the output stream, each coder should encode information to know how many bytes to read when decoding. A common approach is to prefix the encoding with the element's encoded length.- Specified by:
encodein classCoder<T extends Message>- Throws:
IOException- if writing to theOutputStreamfails for some reason
-
encode
Description copied from class:CoderEncodes the given value of typeTonto the given output stream in the given context.- Overrides:
encodein classCoder<T extends Message>- Throws:
IOException- if writing to theOutputStreamfails for some reason
-
decode
Description copied from class:CoderDecodes a value of typeTfrom the given input stream in the given context. Returns the decoded value. Multiple elements can be encoded next to each other on the input stream, each coder should encode information to know how many bytes to read when decoding. A common approach is to prefix the encoding with the element's encoded length.- Specified by:
decodein classCoder<T extends Message>- Throws:
IOException- if reading from theInputStreamfails for some reason
-
decode
Description copied from class:CoderDecodes a value of typeTfrom the given input stream in the given context. Returns the decoded value.- Overrides:
decodein classCoder<T extends Message>- Throws:
IOException- if reading from theInputStreamfails for some reason
-
equals
-
hashCode
public int hashCode() -
verifyDeterministic
Description copied from class:CustomCoderThrowCoder.NonDeterministicExceptionif the coding is not deterministic.In order for a
Coderto be considered deterministic, the following must be true:- two values that compare as equal (via
Object.equals()orComparable.compareTo(), if supported) have the same encoding. - the
Coderalways produces a canonical encoding, which is the same for an instance of an object even if produced on different computers at different times.
- Overrides:
verifyDeterministicin classCustomCoder<T extends Message>- Throws:
Coder.NonDeterministicException- aCustomCoderis presumed nondeterministic.
- two values that compare as equal (via
-
getMessageType
Returns the Protocol BuffersMessagetype thisProtoCodersupports. -
getEncodedTypeDescriptor
Description copied from class:CoderReturns theTypeDescriptorfor the type encoded.- Overrides:
getEncodedTypeDescriptorin classCoder<T extends Message>
-
getExtensionHosts
-
getExtensionRegistry
Returns theExtensionRegistrylisting all known Protocol Buffers extension messages toTregistered with thisProtoCoder. -
getParser
Get the memoizedParser, possibly initializing it lazily. -
getCoderProvider
Returns aCoderProviderwhich uses theProtoCoderforproto messages.This method is invoked reflectively from
DefaultCoder.
-