Class ProtoCoder<T extends Message>
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
DynamicProtoCoder
Coder
using Google Protocol Buffers binary format. ProtoCoder
supports both
Protocol Buffers syntax versions 2 and 3.
To learn more about Protocol Buffers, visit: https://developers.google.com/protocol-buffers
ProtoCoder
is registered in the global CoderRegistry
as the default Coder
for any Message
object. Custom message extensions are also supported, but these
extensions must be registered for a particular ProtoCoder
instance and that instance must
be registered on the PCollection
that needs the extensions:
import MyProtoFile;
import MyProtoFile.MyMessage;
Coder<MyMessage> coder = ProtoCoder.of(MyMessage.class).withExtensionsFrom(MyProtoFile.class);
PCollection<MyMessage> records = input.apply(...).setCoder(coder);
Versioning
ProtoCoder
supports both versions 2 and 3 of the Protocol Buffers syntax. However, the
Java runtime version of the google.com.protobuf
library must match exactly the
version of protoc
that was used to produce the JAR files containing the compiled
.proto
messages.
For more information, see the Protocol Buffers documentation.
ProtoCoder
and Determinism
In general, Protocol Buffers messages can be encoded deterministically within a single pipeline as long as:
- The encoded messages (and any transitively linked messages) do not use
map
fields. - Every Java VM that encodes or decodes the messages use the same runtime version of the
Protocol Buffers library and the same compiled
.proto
file JAR.
ProtoCoder
and Encoding Stability
When changing Protocol Buffers messages, follow the rules in the Protocol Buffers language
guides for
proto2
and proto3
syntaxes, depending on your message type. Following these guidelines will ensure that the old
encoded data can be read by new versions of the code.
Generally, any change to the message type, registered extensions, runtime library, or compiled proto JARs may change the encoding. Thus even if both the original and updated messages can be encoded deterministically within a single job, these deterministic encodings may not be the same across jobs.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.beam.sdk.coders.Coder
Coder.Context, Coder.NonDeterministicException
-
Field Summary
Fields -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
ProtoCoder
(Class<T> protoMessageClass, Set<Class<?>> extensionHostClasses) Private constructor. -
Method Summary
Modifier and TypeMethodDescriptiondecode
(InputStream inStream) Decodes a value of typeT
from the given input stream in the given context.decode
(InputStream inStream, Coder.Context context) Decodes a value of typeT
from the given input stream in the given context.void
encode
(T value, OutputStream outStream) Encodes the given value of typeT
onto the given output stream.void
encode
(T value, OutputStream outStream, Coder.Context context) Encodes the given value of typeT
onto the given output stream in the given context.boolean
static CoderProvider
Returns theTypeDescriptor
for the type encoded.Returns theExtensionRegistry
listing all known Protocol Buffers extension messages toT
registered with thisProtoCoder
.Returns the Protocol BuffersMessage
type thisProtoCoder
supports.Get the memoizedParser
, possibly initializing it lazily.int
hashCode()
static <T extends Message>
ProtoCoder<T> Returns aProtoCoder
for the given Protocol BuffersMessage
.static <T extends Message>
ProtoCoder<T> of
(TypeDescriptor<T> protoMessageType) void
ThrowCoder.NonDeterministicException
if the coding is not deterministic.withExtensionsFrom
(Class<?>... moreExtensionHosts) withExtensionsFrom
(Iterable<Class<?>> moreExtensionHosts) Returns aProtoCoder
like this one, but with the extensions from the given classes registered.Methods inherited from class org.apache.beam.sdk.coders.CustomCoder
getCoderArguments
Methods inherited from class org.apache.beam.sdk.coders.Coder
consistentWithEquals, getEncodedElementByteSize, getEncodedElementByteSizeUsingCoder, isRegisterByteSizeObserverCheap, registerByteSizeObserver, structuralValue, verifyDeterministic, verifyDeterministic
-
Field Details
-
serialVersionUID
public static final long serialVersionUID- See Also:
-
-
Constructor Details
-
ProtoCoder
Private constructor.
-
-
Method Details
-
of
Returns aProtoCoder
for the given Protocol BuffersMessage
. -
of
-
withExtensionsFrom
Returns aProtoCoder
like this one, but with the extensions from the given classes registered.Each of the extension host classes must be an class automatically generated by the Protocol Buffers compiler,
protoc
, that contains messages.Does not modify this object.
-
withExtensionsFrom
SeewithExtensionsFrom(Iterable)
.Does not modify this object.
-
encode
Description copied from class:Coder
Encodes the given value of typeT
onto the given output stream. Multiple elements can be encoded next to each other on the output stream, each coder should encode information to know how many bytes to read when decoding. A common approach is to prefix the encoding with the element's encoded length.- Specified by:
encode
in classCoder<T extends Message>
- Throws:
IOException
- if writing to theOutputStream
fails for some reason
-
encode
Description copied from class:Coder
Encodes the given value of typeT
onto the given output stream in the given context.- Overrides:
encode
in classCoder<T extends Message>
- Throws:
IOException
- if writing to theOutputStream
fails for some reason
-
decode
Description copied from class:Coder
Decodes a value of typeT
from the given input stream in the given context. Returns the decoded value. Multiple elements can be encoded next to each other on the input stream, each coder should encode information to know how many bytes to read when decoding. A common approach is to prefix the encoding with the element's encoded length.- Specified by:
decode
in classCoder<T extends Message>
- Throws:
IOException
- if reading from theInputStream
fails for some reason
-
decode
Description copied from class:Coder
Decodes a value of typeT
from the given input stream in the given context. Returns the decoded value.- Overrides:
decode
in classCoder<T extends Message>
- Throws:
IOException
- if reading from theInputStream
fails for some reason
-
equals
-
hashCode
public int hashCode() -
verifyDeterministic
Description copied from class:CustomCoder
ThrowCoder.NonDeterministicException
if the coding is not deterministic.In order for a
Coder
to be considered deterministic, the following must be true:- two values that compare as equal (via
Object.equals()
orComparable.compareTo()
, if supported) have the same encoding. - the
Coder
always produces a canonical encoding, which is the same for an instance of an object even if produced on different computers at different times.
- Overrides:
verifyDeterministic
in classCustomCoder<T extends Message>
- Throws:
Coder.NonDeterministicException
- aCustomCoder
is presumed nondeterministic.
- two values that compare as equal (via
-
getMessageType
Returns the Protocol BuffersMessage
type thisProtoCoder
supports. -
getEncodedTypeDescriptor
Description copied from class:Coder
Returns theTypeDescriptor
for the type encoded.- Overrides:
getEncodedTypeDescriptor
in classCoder<T extends Message>
-
getExtensionHosts
-
getExtensionRegistry
Returns theExtensionRegistry
listing all known Protocol Buffers extension messages toT
registered with thisProtoCoder
. -
getParser
Get the memoizedParser
, possibly initializing it lazily. -
getCoderProvider
Returns aCoderProvider
which uses theProtoCoder
forproto messages
.This method is invoked reflectively from
DefaultCoder
.
-