Class AvroCoder<T>

java.lang.Object
org.apache.beam.sdk.coders.Coder<T>
org.apache.beam.sdk.coders.CustomCoder<T>
org.apache.beam.sdk.extensions.avro.coders.AvroCoder<T>
Type Parameters:
T - the type of elements handled by this coder
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
AvroGenericCoder

public class AvroCoder<T> extends CustomCoder<T>
A Coder using Avro binary format.

Each instance of AvroCoder<T> encapsulates an Avro datum factory and schema for objects of type T.

The Avro datum factory and schema may be provided explicitly via of(AvroDatumFactory, Schema) or omitted via specific(Class) or reflect(Class) in which case it will be inferred using Avro's SpecificData or ReflectData

For complete details about schema generation and how it can be controlled please see the org.apache.avro.specific and org.apache.avro.reflect packages.

To use, specify the Coder type on a PCollection:


 PCollection<MyCustomElement> records =
     input.apply(...)
          .setCoder(AvroCoder.of(MyCustomElement.class));
 

or annotate the element class using @DefaultCoder.

@DefaultCoder(AvroCoder.class)
 public class MyCustomElement {
     ...
 }
 

The implementation attempts to determine if the Avro encoding of the given type will satisfy the criteria of Coder.verifyDeterministic() by inspecting both the type and the Schema provided or generated by Avro. Only coders that are deterministic can be used in GroupByKey operations.

See Also:
  • Constructor Details

  • Method Details

    • generic

      public static AvroCoder<GenericRecord> generic(Schema schema)
      Returns an AvroCoder instance for the Avro schema. The implicit type is GenericRecord.
    • specific

      public static <T> AvroCoder<T> specific(TypeDescriptor<T> type)
      Returns an AvroCoder instance for the provided element type respecting Avro's Specific* suite for encoding and decoding.
    • specific

      public static <T> AvroCoder<T> specific(Class<T> type)
      Returns an AvroCoder instance for the provided element type respecting Avro's Specific* suite for encoding and decoding.
    • specific

      public static <T> AvroCoder<T> specific(Class<T> type, Schema schema)
      Returns an AvroCoder instance for the provided element type respecting Avro's Specific* suite for encoding and decoding.

      The schema must correspond to the type provided.

    • reflect

      public static <T> AvroCoder<T> reflect(TypeDescriptor<T> type)
      Returns an AvroCoder instance for the provided element type respecting Avro's Reflect* suite for encoding and decoding.
    • reflect

      public static <T> AvroCoder<T> reflect(Class<T> type)
      Returns an AvroCoder instance for the provided element type respecting Avro's Reflect* suite for encoding and decoding.
    • reflect

      public static <T> AvroCoder<T> reflect(Class<T> type, Schema schema)
      Returns an AvroCoder instance for the provided element type respecting Avro's Reflect* suite for encoding and decoding.

      The schema must correspond to the type provided.

    • of

      public static AvroGenericCoder of(Schema schema)
      Returns an AvroGenericCoder instance for the Avro schema. The implicit type is GenericRecord.
    • of

      public static <T> AvroCoder<T> of(TypeDescriptor<T> type)
      Returns an AvroCoder instance for the provided element type.
      Type Parameters:
      T - the element type
    • of

      public static <T> AvroCoder<T> of(TypeDescriptor<T> type, boolean useReflectApi)
      Returns an AvroCoder instance for the provided element type, respecting whether to use Avro's Reflect* or Specific* suite for encoding and decoding.
      Type Parameters:
      T - the element type
    • of

      public static <T> AvroCoder<T> of(Class<T> clazz)
      Returns an AvroCoder instance for the provided element class.
      Type Parameters:
      T - the element type
    • of

      public static <T> AvroCoder<T> of(Class<T> type, boolean useReflectApi)
      Returns an AvroCoder instance for the given class, respecting whether to use Avro's Reflect* or Specific* suite for encoding and decoding.
      Type Parameters:
      T - the element type
    • of

      public static <T> AvroCoder<T> of(Class<T> type, Schema schema)
      Returns an AvroCoder instance for the provided element type using the provided Avro schema

      The schema must correspond to the type provided.

      Type Parameters:
      T - the element type
    • of

      public static <T> AvroCoder<T> of(AvroDatumFactory<T> datumFactory, Schema schema)
      Returns an AvroCoder instance for the provided AvroDatumFactory using the provided Avro schema.

      The schema must correspond to the provided datumFactory's type.

      Type Parameters:
      T - the element type
    • of

      public static <T> AvroCoder<T> of(Class<T> type, Schema schema, boolean useReflectApi)
      Returns an AvroCoder instance for the provided element type using the provided Avro schema, respecting whether to use Avro's Reflect* or Specific* suite for encoding and decoding.

      The schema must correspond to the type provided.

      Type Parameters:
      T - the element type
    • getCoderProvider

      public static CoderProvider getCoderProvider()
      Returns a CoderProvider which uses the AvroCoder if possible for all types.

      It is unsafe to register this as a CoderProvider because Avro will reflectively accept dangerous types such as Object.

      This method is invoked reflectively from DefaultCoder.

    • getType

      public Class<T> getType()
      Returns the type this coder encodes/decodes.
    • getDatumFactory

      public AvroDatumFactory<T> getDatumFactory()
      Returns the datum factory used for encoding/decoding.
    • getDatumWriter

      public DatumWriter<T> getDatumWriter()
      Returns the DatumWriter used for encoding.
    • getDatumReader

      public DatumReader<T> getDatumReader()
      Returns the DatumReader used for decoding.
    • useReflectApi

      @Deprecated public boolean useReflectApi()
      Deprecated.
      kept for backward API compatibility only.
      Returns:
      true if internal AvroDatumFactory is an instance of AvroDatumFactory.ReflectDatumFactory
    • encode

      public void encode(T value, OutputStream outStream) throws IOException
      Description copied from class: Coder
      Encodes the given value of type T onto the given output stream. Multiple elements can be encoded next to each other on the output stream, each coder should encode information to know how many bytes to read when decoding. A common approach is to prefix the encoding with the element's encoded length.
      Specified by:
      encode in class Coder<T>
      Throws:
      IOException - if writing to the OutputStream fails for some reason
    • decode

      public T decode(InputStream inStream) throws IOException
      Description copied from class: Coder
      Decodes a value of type T from the given input stream in the given context. Returns the decoded value. Multiple elements can be encoded next to each other on the input stream, each coder should encode information to know how many bytes to read when decoding. A common approach is to prefix the encoding with the element's encoded length.
      Specified by:
      decode in class Coder<T>
      Throws:
      IOException - if reading from the InputStream fails for some reason
    • verifyDeterministic

      public void verifyDeterministic() throws Coder.NonDeterministicException
      Description copied from class: CustomCoder
      Throw Coder.NonDeterministicException if the coding is not deterministic.

      In order for a Coder to be considered deterministic, the following must be true:

      • two values that compare as equal (via Object.equals() or Comparable.compareTo(), if supported) have the same encoding.
      • the Coder always produces a canonical encoding, which is the same for an instance of an object even if produced on different computers at different times.
      Overrides:
      verifyDeterministic in class CustomCoder<T>
      Throws:
      Coder.NonDeterministicException - when the type may not be deterministically encoded using the given Schema, the directBinaryEncoder, and the ReflectDatumWriter or GenericDatumWriter.
    • getSchema

      public Schema getSchema()
      Returns the schema used by this coder.
    • getEncodedTypeDescriptor

      public TypeDescriptor<T> getEncodedTypeDescriptor()
      Description copied from class: Coder
      Returns the TypeDescriptor for the type encoded.
      Overrides:
      getEncodedTypeDescriptor in class Coder<T>
    • equals

      public boolean equals(@Nullable Object other)
      Overrides:
      equals in class Object
      Returns:
      true if the two AvroCoder instances have the same class, type and schema.
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object