Package org.apache.beam.sdk.io.parquet
Class ParquetIO.Sink
java.lang.Object
org.apache.beam.sdk.io.parquet.ParquetIO.Sink
- All Implemented Interfaces:
Serializable,FileIO.Sink<GenericRecord>
- Enclosing class:
ParquetIO
Implementation of
ParquetIO.sink(org.apache.avro.Schema).- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidflush()Flushes the buffered state (if any) before the channel is closed.voidopen(WritableByteChannel channel) Initializes writing to the given channel.withAvroDataModel(GenericData model) Define the Avro data model; seeAvroParquetWriter.Builder.withDataModel(GenericData).withBloomFilterEnabled(boolean enableBloomFilter) Enable or disable bloom filters.withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName) Specifies compression codec.withConfiguration(Map<String, String> configuration) Specify Hadoop configuration for ParquetWriter.withConfiguration(org.apache.hadoop.conf.Configuration configuration) Specify Hadoop configuration for ParquetWriter.withDictionaryEncoding(boolean enableDictionary) Enable or disable dictionary encoding.withMinRowCountForPageSizeCheck(int minRowCountForPageSizeCheck) Specify the minimum number of rows to write before a page size check is performed.withPageSize(int pageSize) Specify the page size for the Parquet writer.withRowGroupSize(int rowGroupSize) Specify row-group size; if not set or zero, a default is used by the underlying writer.voidwrite(GenericRecord element) Appends a single element to the file.
-
Constructor Details
-
Sink
public Sink()
-
-
Method Details
-
withCompressionCodec
public ParquetIO.Sink withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName) Specifies compression codec. By default, CompressionCodecName.SNAPPY. -
withConfiguration
Specify Hadoop configuration for ParquetWriter. -
withConfiguration
Specify Hadoop configuration for ParquetWriter. -
withRowGroupSize
Specify row-group size; if not set or zero, a default is used by the underlying writer. -
withPageSize
Specify the page size for the Parquet writer. Defaults to1 MB. -
withDictionaryEncoding
Enable or disable dictionary encoding. Enabled by default. -
withBloomFilterEnabled
Enable or disable bloom filters. Disabled by default. -
withMinRowCountForPageSizeCheck
Specify the minimum number of rows to write before a page size check is performed. The writer buffers at least this many rows before checking whether the page size threshold has been reached. With large rows, the default (100) can cause excessive memory use; set a lower value (e.g.1) to flush pages more frequently. -
withAvroDataModel
Define the Avro data model; seeAvroParquetWriter.Builder.withDataModel(GenericData). -
open
Description copied from interface:FileIO.SinkInitializes writing to the given channel. Will be invoked once on a givenFileIO.Sinkinstance.- Specified by:
openin interfaceFileIO.Sink<GenericRecord>- Throws:
IOException
-
write
Description copied from interface:FileIO.SinkAppends a single element to the file. May be invoked zero or more times.- Specified by:
writein interfaceFileIO.Sink<GenericRecord>- Throws:
IOException
-
flush
Description copied from interface:FileIO.SinkFlushes the buffered state (if any) before the channel is closed. Does not need to close the channel. Will be invoked once.- Specified by:
flushin interfaceFileIO.Sink<GenericRecord>- Throws:
IOException
-