@Experimental(value=SOURCE_SINK) public class ParquetIO extends java.lang.Object
ParquetIO source returns a PCollection for Parquet files. The elements in the
 PCollection are Avro GenericRecord.
 
To configure the ParquetIO.Read, you have to provide the file patterns (from) of the Parquet
 files and the schema.
 
For example:
 PCollection<GenericRecord> records = pipeline.apply(ParquetIO.read(SCHEMA).from("/foo/bar"));
 ...
 As ParquetIO.Read is based on FileIO, it supports any filesystem (hdfs, ...).
 
When using schemas created via reflection, it may be useful to generate GenericRecord
 instances rather than instances of the class associated with the schema. ParquetIO.Read and ParquetIO.ReadFiles provide ParquetIO.Read.withAvroDataModel(GenericData) allowing implementations
 to set the data model associated with the AvroParquetReader
 
For more advanced use cases, like reading each file in a PCollection of FileIO.ReadableFile, use the ParquetIO.ReadFiles transform.
 
For example:
 PCollection<FileIO.ReadableFile> files = pipeline
   .apply(FileIO.match().filepattern(options.getInputFilepattern())
   .apply(FileIO.readMatches());
 PCollection<GenericRecord> output = files.apply(ParquetIO.readFiles(SCHEMA));
 ParquetIO.Sink allows you to write a PCollection of GenericRecord into
 a Parquet file. It can be used with the general-purpose FileIO transforms with
 FileIO.write/writeDynamic specifically.
 
By default, ParquetIO.Sink produces output files that are compressed using the CompressionCodec.SNAPPY. This default can be changed or overridden
 using ParquetIO.Sink.withCompressionCodec(CompressionCodecName).
 
For example:
 pipeline
   .apply(...) // PCollection<GenericRecord>
   .apply(FileIO
     .<GenericRecord>write()
     .via(ParquetIO.sink(SCHEMA)
       .withCompressionCodec(CompressionCodecName.SNAPPY))
     .to("destination/path")
     .withSuffix(".parquet"));
 This IO API is considered experimental and may break or receive backwards-incompatible changes in future versions of the Apache Beam SDK.
| Modifier and Type | Class and Description | 
|---|---|
| static class  | ParquetIO.ReadImplementation of  read(Schema). | 
| static class  | ParquetIO.ReadFilesImplementation of  readFiles(Schema). | 
| static class  | ParquetIO.SinkImplementation of  sink(org.apache.avro.Schema). | 
| Modifier and Type | Method and Description | 
|---|---|
| static ParquetIO.Read | read(Schema schema)Reads  GenericRecordfrom a Parquet file (or multiple Parquet files matching the
 pattern). | 
| static ParquetIO.ReadFiles | readFiles(Schema schema)Like  read(Schema), but reads each file in aPCollectionofFileIO.ReadableFile, which allows more flexible usage. | 
| static ParquetIO.Sink | sink(Schema schema)Creates a  ParquetIO.Sinkthat, for use withFileIO.write(). | 
public static ParquetIO.Read read(Schema schema)
GenericRecord from a Parquet file (or multiple Parquet files matching the
 pattern).public static ParquetIO.ReadFiles readFiles(Schema schema)
read(Schema), but reads each file in a PCollection of FileIO.ReadableFile, which allows more flexible usage.public static ParquetIO.Sink sink(Schema schema)
ParquetIO.Sink that, for use with FileIO.write().