CompressedSource.CompressedReader (Apache Beam 2.27.0)

java.lang.Object
- org.apache.beam.sdk.io.Source.Reader<T>
- - org.apache.beam.sdk.io.BoundedSource.BoundedReader<T>
  - - org.apache.beam.sdk.io.OffsetBasedSource.OffsetBasedReader<T>
    - - org.apache.beam.sdk.io.FileBasedSource.FileBasedReader<T>
      - org.apache.beam.sdk.io.CompressedSource.CompressedReader<T>

Type Parameters:

T - The type of records read from the source.

All Implemented Interfaces:

java.lang.AutoCloseable

Enclosing class:

CompressedSource<T>
```
public static class CompressedSource.CompressedReader<T>
extends FileBasedSource.FileBasedReader<T>
```
Reader for a CompressedSource. Decompresses its input and uses a delegate reader to read elements from the decompressed input.

Field Summary
- Fields inherited from class org.apache.beam.sdk.io.BoundedSource.BoundedReader
  SPLIT_POINTS_UNKNOWN

Constructor Summary

Constructors
Constructor and Description
`CompressedReader(CompressedSource<T> source, FileBasedSource.FileBasedReader<T> readerDelegate)` Create a `CompressedReader` from a `CompressedSource` and delegate reader.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`allowsDynamicSplitting()` Whether this reader should allow dynamic splitting of the offset ranges.
`T`	`getCurrent()` Gets the current record from the delegate reader.
`protected long`	`getCurrentOffset()` Returns the starting offset of the `current record`, which has been read by the last successful `Source.Reader.start()` or `Source.Reader.advance()` call.
`Instant`	`getCurrentTimestamp()` By default, returns the minimum possible timestamp.
`long`	`getSplitPointsConsumed()` Returns the total amount of parallelism in the consumed (returned and processed) range of this reader's current `BoundedSource` (as would be returned by `BoundedSource.BoundedReader.getCurrentSource()`).
`long`	`getSplitPointsRemaining()` Returns the total amount of parallelism in the unprocessed part of this reader's current `BoundedSource` (as would be returned by `BoundedSource.BoundedReader.getCurrentSource()`).
`protected boolean`	`isAtSplitPoint()` Returns true only for the first record; compressed sources cannot be split.
`protected boolean`	`readNextRecord()` Reads the next record via the delegate reader.
`protected void`	`startReading(java.nio.channels.ReadableByteChannel channel)` Creates a decompressing channel from the input channel and passes it to its delegate reader's `FileBasedReader#startReading(ReadableByteChannel)`.

Methods inherited from class org.apache.beam.sdk.io.FileBasedSource.FileBasedReader
advanceImpl, close, getCurrentSource, startImpl

Methods inherited from class org.apache.beam.sdk.io.OffsetBasedSource.OffsetBasedReader
advance, getFractionConsumed, isDone, isStarted, splitAtFraction, start

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - CompressedReader
```
public CompressedReader(CompressedSource<T> source,
                        FileBasedSource.FileBasedReader<T> readerDelegate)
```
    Create a CompressedReader from a CompressedSource and delegate reader.
- Method Detail
  - getCurrent
```
public T getCurrent()
             throws java.util.NoSuchElementException
```
    Gets the current record from the delegate reader.
    
    Specified by:
    
    getCurrent in class Source.Reader<T>
    
    Throws:
    
    java.util.NoSuchElementException - if Source.Reader.start() was never called, or if the last Source.Reader.start() or Source.Reader.advance() returned false.
  - allowsDynamicSplitting
```
public boolean allowsDynamicSplitting()
```
    Description copied from class: OffsetBasedSource.OffsetBasedReader
    
    Whether this reader should allow dynamic splitting of the offset ranges.
    True by default. Override this to return false if the reader cannot support dynamic splitting correctly. If this returns false, OffsetBasedSource.OffsetBasedReader.splitAtFraction(double) will refuse all split requests.
    
    Overrides:
    
    allowsDynamicSplitting in class FileBasedSource.FileBasedReader<T>
  - getSplitPointsConsumed
```
public final long getSplitPointsConsumed()
```
    Description copied from class: BoundedSource.BoundedReader
    Returns the total amount of parallelism in the consumed (returned and processed) range of this reader's current BoundedSource (as would be returned by BoundedSource.BoundedReader.getCurrentSource()). This corresponds to all split point records (see RangeTracker) returned by this reader, excluding the last split point returned if the reader is not finished.
    Consider the following examples: (1) An input that can be read in parallel down to the individual records, such as CountingSource.upTo(long), is called "perfectly splittable". (2) a "block-compressed" file format such as AvroIO, in which a block of records has to be read as a whole, but different blocks can be read in parallel. (3) An "unsplittable" input such as a cursor in a database.
    - Any reader that is unstarted (aka, has never had a call to Source.Reader.start()) has a consumed parallelism of 0. This condition holds independent of whether the input is splittable.
    - Any reader that has only returned its first element (aka, has never had a call to Source.Reader.advance()) has a consumed parallelism of 0: the first element is the current element and is still being processed. This condition holds independent of whether the input is splittable.
    - For an empty reader (in which the call to Source.Reader.start() returned false), the consumed parallelism is 0. This condition holds independent of whether the input is splittable.
    - For a non-empty, finished reader (in which the call to Source.Reader.start() returned true and a call to Source.Reader.advance() has returned false), the value returned must be at least 1 and should equal the total parallelism in the source.
    - For example (1): After returning record #30 (starting at 1) out of 50 in a perfectly splittable 50-record input, this value should be 29. When finished, the consumed parallelism should be 50.
    - For example (2): In a block-compressed value consisting of 5 blocks, the value should stay at 0 until the first record of the second block is returned; stay at 1 until the first record of the third block is returned, etc. Only once the end-of-file is reached then the fifth block has been consumed and the value should stay at 5.
    - For example (3): For any non-empty unsplittable input, the consumed parallelism is 0 until the reader is finished (because the last call to Source.Reader.advance() returned false, at which point it becomes 1.
    A reader that is implemented using a RangeTracker is encouraged to use the range tracker's ability to count split points to implement this method. See OffsetBasedSource.OffsetBasedReader and OffsetRangeTracker for an example.
    Defaults to BoundedSource.BoundedReader.SPLIT_POINTS_UNKNOWN. Any value less than 0 will be interpreted as unknown.
    Thread safety
    See the javadoc on BoundedSource.BoundedReader for information about thread safety.
    Overrides:
    
    getSplitPointsConsumed in class OffsetBasedSource.OffsetBasedReader<T>
    
    See Also:
    
    BoundedSource.BoundedReader.getSplitPointsRemaining()
  - getSplitPointsRemaining
```
public final long getSplitPointsRemaining()
```
    Description copied from class: BoundedSource.BoundedReader
    Returns the total amount of parallelism in the unprocessed part of this reader's current BoundedSource (as would be returned by BoundedSource.BoundedReader.getCurrentSource()). This corresponds to all unprocessed split point records (see RangeTracker), including the last split point returned, in the remainder part of the source.
    This function should be implemented only in addition to BoundedSource.BoundedReader.getSplitPointsConsumed() and only if an exact value can be returned.
    Consider the following examples: (1) An input that can be read in parallel down to the individual records, such as CountingSource.upTo(long), is called "perfectly splittable". (2) a "block-compressed" file format such as AvroIO, in which a block of records has to be read as a whole, but different blocks can be read in parallel. (3) An "unsplittable" input such as a cursor in a database.
    Assume for examples (1) and (2) that the number of records or blocks remaining is known:
    - Any reader for which the last call to Source.Reader.start() or Source.Reader.advance() has returned true should should not return 0, because this reader itself represents parallelism at least 1. This condition holds independent of whether the input is splittable.
    - A finished reader (for which Source.Reader.start() or Source.Reader.advance()) has returned false should return a value of 0. This condition holds independent of whether the input is splittable.
    - For example 1: After returning record #30 (starting at 1) out of 50 in a perfectly splittable 50-record input, this value should be 21 (20 remaining + 1 current) if the total number of records is known.
    - For example 2: After returning a record in block 3 in a block-compressed file consisting of 5 blocks, this value should be 3 (since blocks 4 and 5 can be processed in parallel by new readers produced via dynamic work rebalancing, while the current reader continues processing block 3) if the total number of blocks is known.
    - For example (3): a reader for any non-empty unsplittable input, should return 1 until it is finished, at which point it should return 0.
    - For any reader: After returning the last split point in a file (e.g., the last record in example (1), the first record in the last block for example (2), or the first record in the file for example (3), this value should be 1: apart from the current task, no additional remainder can be split off.
    Defaults to BoundedSource.BoundedReader.SPLIT_POINTS_UNKNOWN. Any value less than 0 will be interpreted as unknown.
    Thread safety
    See the javadoc on BoundedSource.BoundedReader for information about thread safety.
    Overrides:
    
    getSplitPointsRemaining in class OffsetBasedSource.OffsetBasedReader<T>
    
    See Also:
    
    BoundedSource.BoundedReader.getSplitPointsConsumed()
  - isAtSplitPoint
```
protected final boolean isAtSplitPoint()
```
    Returns true only for the first record; compressed sources cannot be split.
    
    Overrides:
    
    isAtSplitPoint in class OffsetBasedSource.OffsetBasedReader<T>
  - startReading
```
protected final void startReading(java.nio.channels.ReadableByteChannel channel)
                           throws java.io.IOException
```
    Creates a decompressing channel from the input channel and passes it to its delegate reader's FileBasedReader#startReading(ReadableByteChannel).
    
    Specified by:
    
    startReading in class FileBasedSource.FileBasedReader<T>
    
    Parameters:
    
    channel - a byte channel representing the file backing the reader.
    
    Throws:
    
    java.io.IOException
  - readNextRecord
```
protected final boolean readNextRecord()
                                throws java.io.IOException
```
    Reads the next record via the delegate reader.
    
    Specified by:
    
    readNextRecord in class FileBasedSource.FileBasedReader<T>
    
    Returns:
    
    true if a record was successfully read, false if the end of the channel was reached before successfully reading a new record.
    
    Throws:
    
    java.io.IOException
  - getCurrentOffset
```
protected final long getCurrentOffset()
                               throws java.util.NoSuchElementException
```
    Description copied from class: OffsetBasedSource.OffsetBasedReader
    
    Returns the starting offset of the current record, which has been read by the last successful Source.Reader.start() or Source.Reader.advance() call.
    If no such call has been made yet, the return value is unspecified.
    See RangeTracker for description of offset semantics.
    
    Specified by:
    
    getCurrentOffset in class OffsetBasedSource.OffsetBasedReader<T>
    
    Throws:
    
    java.util.NoSuchElementException
  - getCurrentTimestamp
```
public Instant getCurrentTimestamp()
                            throws java.util.NoSuchElementException
```
    Description copied from class: BoundedSource.BoundedReader
    
    By default, returns the minimum possible timestamp.
    
    Overrides:
    
    getCurrentTimestamp in class BoundedSource.BoundedReader<T>
    
    Throws:
    
    java.util.NoSuchElementException - if the reader is at the beginning of the input and Source.Reader.start() or Source.Reader.advance() wasn't called, or if the last Source.Reader.start() or Source.Reader.advance() returned false.

Class CompressedSource.CompressedReader<T>

Field Summary

Fields inherited from class org.apache.beam.sdk.io.BoundedSource.BoundedReader

Constructor Summary

Method Summary

Methods inherited from class org.apache.beam.sdk.io.FileBasedSource.FileBasedReader

Methods inherited from class org.apache.beam.sdk.io.OffsetBasedSource.OffsetBasedReader

Methods inherited from class java.lang.Object

Constructor Detail

CompressedReader

Method Detail

getCurrent

allowsDynamicSplitting

getSplitPointsConsumed

Thread safety

getSplitPointsRemaining

Thread safety

isAtSplitPoint

startReading

readNextRecord

getCurrentOffset

getCurrentTimestamp