Class CompressedSource<T>

Type Parameters:
T - The type to read from the compressed file.
All Implemented Interfaces:
Serializable, HasDisplayData

public class CompressedSource<T> extends FileBasedSource<T>
A Source that reads from compressed files. A CompressedSources wraps a delegate FileBasedSource that is able to read the decompressed file format.

For example, use the following to read from a gzip-compressed file-based source:


 FileBasedSource<T> mySource = ...;
 PCollection<T> collection = p.apply(Read.from(CompressedSource
     .from(mySource)
     .withCompression(Compression.GZIP)));
 

Supported compression algorithms are Compression.GZIP, Compression.BZIP2, Compression.ZIP, Compression.ZSTD, Compression.LZO, Compression.LZOP, Compression.SNAPPY, and Compression.DEFLATE. User-defined compression types are supported by implementing a CompressedSource.DecompressingChannelFactory.

By default, the compression algorithm is selected from those supported in Compression based on the file name provided to the source, namely ".bz2" indicates Compression.BZIP2, ".gz" indicates Compression.GZIP, ".zip" indicates Compression.ZIP, ".zst" indicates Compression.ZSTD, ".lzo_deflate" indicates Compression.LZO, ".lzo" indicates Compression.LZOP, ".snappy" indicted Compression.SNAPPY, and ".deflate" indicates Compression.DEFLATE. If the file name does not match any of the supported algorithms, it is assumed to be uncompressed data.

See Also: