Class ParquetIO.Sink

java.lang.Object
org.apache.beam.sdk.io.parquet.ParquetIO.Sink
All Implemented Interfaces:
Serializable, FileIO.Sink<GenericRecord>
Enclosing class:
ParquetIO

public abstract static class ParquetIO.Sink extends Object implements FileIO.Sink<GenericRecord>
See Also:
  • Constructor Details

    • Sink

      public Sink()
  • Method Details

    • withCompressionCodec

      public ParquetIO.Sink withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName)
      Specifies compression codec. By default, CompressionCodecName.SNAPPY.
    • withConfiguration

      public ParquetIO.Sink withConfiguration(Map<String,String> configuration)
      Specify Hadoop configuration for ParquetWriter.
    • withConfiguration

      public ParquetIO.Sink withConfiguration(org.apache.hadoop.conf.Configuration configuration)
      Specify Hadoop configuration for ParquetWriter.
    • withRowGroupSize

      public ParquetIO.Sink withRowGroupSize(int rowGroupSize)
      Specify row-group size; if not set or zero, a default is used by the underlying writer.
    • withPageSize

      public ParquetIO.Sink withPageSize(int pageSize)
      Specify the page size for the Parquet writer. Defaults to 1 MB.
    • withDictionaryEncoding

      public ParquetIO.Sink withDictionaryEncoding(boolean enableDictionary)
      Enable or disable dictionary encoding. Enabled by default.
    • withBloomFilterEnabled

      public ParquetIO.Sink withBloomFilterEnabled(boolean enableBloomFilter)
      Enable or disable bloom filters. Disabled by default.
    • withMinRowCountForPageSizeCheck

      public ParquetIO.Sink withMinRowCountForPageSizeCheck(int minRowCountForPageSizeCheck)
      Specify the minimum number of rows to write before a page size check is performed. The writer buffers at least this many rows before checking whether the page size threshold has been reached. With large rows, the default (100) can cause excessive memory use; set a lower value (e.g. 1) to flush pages more frequently.
    • withAvroDataModel

      public ParquetIO.Sink withAvroDataModel(GenericData model)
      Define the Avro data model; see AvroParquetWriter.Builder.withDataModel(GenericData).
    • open

      public void open(WritableByteChannel channel) throws IOException
      Description copied from interface: FileIO.Sink
      Initializes writing to the given channel. Will be invoked once on a given FileIO.Sink instance.
      Specified by:
      open in interface FileIO.Sink<GenericRecord>
      Throws:
      IOException
    • write

      public void write(GenericRecord element) throws IOException
      Description copied from interface: FileIO.Sink
      Appends a single element to the file. May be invoked zero or more times.
      Specified by:
      write in interface FileIO.Sink<GenericRecord>
      Throws:
      IOException
    • flush

      public void flush() throws IOException
      Description copied from interface: FileIO.Sink
      Flushes the buffered state (if any) before the channel is closed. Does not need to close the channel. Will be invoked once.
      Specified by:
      flush in interface FileIO.Sink<GenericRecord>
      Throws:
      IOException