T - The type of records to be read from the source.@Experimental(value=SOURCE_SINK) public class AvroSource<T> extends BlockBasedSource<T>
AvroIO.Read.
 A FileBasedSource for reading Avro files.
 
To read a PCollection of objects from one or more Avro files, use from(org.apache.beam.sdk.options.ValueProvider<java.lang.String>) to specify the path(s) of the files to read. The AvroSource that is
 returned will read objects of type GenericRecord with the schema(s) that were written at
 file creation. To further configure the AvroSource to read with a user-defined schema, or
 to return records of a type other than GenericRecord, use withSchema(Schema) (using an Avro Schema), withSchema(String) (using a JSON schema), or withSchema(Class) (to
 return objects of the Avro-generated class specified).
 
An AvroSource can be read from using the Read transform. For example:
 
 AvroSource<MyType> source = AvroSource.from(file.toPath()).withSchema(MyType.class);
 PCollection<MyType> records = Read.from(mySource);
 This class's implementation is based on the Avro 1.7.7 specification and implements parsing of some parts of Avro Object Container Files. The rationale for doing so is that the Avro API does not provide efficient ways of computing the precise offsets of blocks within a file, which is necessary to support dynamic work rebalancing. However, whenever it is possible to use the Avro API in a way that supports maintaining precise offsets, this class uses the Avro API.
Avro Object Container files store records in blocks. Each block contains a collection of records. Blocks may be encoded (e.g., with bzip2, deflate, snappy, etc.). Blocks are delineated from one another by a 16-byte sync marker.
An AvroSource for a subrange of a single file contains records in the blocks such that
 the start offset of the block is greater than or equal to the start offset of the source and less
 than the end offset of the source.
 
To use XZ-encoded Avro files, please include an explicit dependency on xz-1.8.jar,
 which has been marked as optional in the Maven sdk/pom.xml.
 
 <dependency>
   <groupId>org.tukaani</groupId>
   <artifactId>xz</artifactId>
   <version>1.8</version>
 </dependency>
 Permission requirements depend on the PipelineRunner that is used to execute the
 pipeline. Please refer to the documentation of corresponding PipelineRunners for more
 details.
| Modifier and Type | Class and Description | 
|---|---|
| static class  | AvroSource.AvroReader<T>A  BlockBasedReaderfor reading blocks from Avro files. | 
| static interface  | AvroSource.DatumReaderFactory<T> | 
BlockBasedSource.Block<T>, BlockBasedSource.BlockBasedReader<T>FileBasedSource.FileBasedReader<T>OffsetBasedSource.OffsetBasedReader<T>BoundedSource.BoundedReader<T>Source.Reader<T>| Modifier and Type | Method and Description | 
|---|---|
| BlockBasedSource<T> | createForSubrangeOfFile(MatchResult.Metadata fileMetadata,
                       long start,
                       long end)Creates a  BlockBasedSourcefor the specified range in a single file. | 
| BlockBasedSource<T> | createForSubrangeOfFile(java.lang.String fileName,
                       long start,
                       long end)Deprecated. 
 Used by Dataflow worker | 
| protected BlockBasedSource.BlockBasedReader<T> | createSingleFileReader(PipelineOptions options)Creates a  BlockBasedReader. | 
| static AvroSource<GenericRecord> | from(MatchResult.Metadata metadata) | 
| static AvroSource<GenericRecord> | from(java.lang.String fileNameOrPattern)Like  from(ValueProvider). | 
| static AvroSource<GenericRecord> | from(ValueProvider<java.lang.String> fileNameOrPattern)Reads from the given file name or pattern ("glob"). | 
| Coder<T> | getOutputCoder()Returns the  Coderto use for the data read from this source. | 
| void | validate()Checks that this source is valid, before it can be used in a pipeline. | 
| AvroSource<T> | withDatumReaderFactory(AvroSource.DatumReaderFactory<?> factory) | 
| AvroSource<T> | withEmptyMatchTreatment(EmptyMatchTreatment emptyMatchTreatment) | 
| AvroSource<T> | withMinBundleSize(long minBundleSize)Sets the minimum bundle size. | 
| <X> AvroSource<X> | withParseFn(SerializableFunction<GenericRecord,X> parseFn,
           Coder<X> coder)Reads  GenericRecordof unspecified schema and maps them to instances of a custom type
 using the givenparseFnand encoded using the given coder. | 
| <X> AvroSource<X> | withSchema(java.lang.Class<X> clazz)Reads files containing records of the given class. | 
| AvroSource<GenericRecord> | withSchema(Schema schema)Like  withSchema(String). | 
| AvroSource<GenericRecord> | withSchema(java.lang.String schema)Reads files containing records that conform to the given schema. | 
createReader, createSourceForSubrange, getEmptyMatchTreatment, getEstimatedSizeBytes, getFileOrPatternSpec, getFileOrPatternSpecProvider, getMaxEndOffset, getMode, getSingleFileMetadata, isSplittable, populateDisplayData, split, toStringgetBytesPerOffset, getEndOffset, getMinBundleSize, getStartOffsetgetDefaultOutputCoderpublic static AvroSource<GenericRecord> from(ValueProvider<java.lang.String> fileNameOrPattern)
withSchema(java.lang.String) to return a type other than GenericRecord.public static AvroSource<GenericRecord> from(MatchResult.Metadata metadata)
public static AvroSource<GenericRecord> from(java.lang.String fileNameOrPattern)
from(ValueProvider).public AvroSource<T> withEmptyMatchTreatment(EmptyMatchTreatment emptyMatchTreatment)
public AvroSource<GenericRecord> withSchema(java.lang.String schema)
public AvroSource<GenericRecord> withSchema(Schema schema)
withSchema(String).public <X> AvroSource<X> withSchema(java.lang.Class<X> clazz)
public <X> AvroSource<X> withParseFn(SerializableFunction<GenericRecord,X> parseFn, Coder<X> coder)
GenericRecord of unspecified schema and maps them to instances of a custom type
 using the given parseFn and encoded using the given coder.public AvroSource<T> withMinBundleSize(long minBundleSize)
OffsetBasedSource for a description of minBundleSize and its use.public AvroSource<T> withDatumReaderFactory(AvroSource.DatumReaderFactory<?> factory)
public void validate()
SourceIt is recommended to use Preconditions for implementing this method.
validate in class FileBasedSource<T>@Deprecated public BlockBasedSource<T> createForSubrangeOfFile(java.lang.String fileName, long start, long end) throws java.io.IOException
java.io.IOExceptionpublic BlockBasedSource<T> createForSubrangeOfFile(MatchResult.Metadata fileMetadata, long start, long end)
BlockBasedSourceBlockBasedSource for the specified range in a single file.createForSubrangeOfFile in class BlockBasedSource<T>fileMetadata - file backing the new FileBasedSource.start - starting byte offset of the new FileBasedSource.end - ending byte offset of the new FileBasedSource. May be Long.MAX_VALUE, in
     which case it will be inferred using FileBasedSource.getMaxEndOffset(org.apache.beam.sdk.options.PipelineOptions).protected BlockBasedSource.BlockBasedReader<T> createSingleFileReader(PipelineOptions options)
BlockBasedSourceBlockBasedReader.createSingleFileReader in class BlockBasedSource<T>