T
- The type of records read from the source.public static class CompressedSource.CompressedReader<T> extends FileBasedSource.FileBasedReader<T>
CompressedSource
. Decompresses its input and uses a delegate
reader to read elements from the decompressed input.SPLIT_POINTS_UNKNOWN
Constructor and Description |
---|
CompressedReader(CompressedSource<T> source,
FileBasedSource.FileBasedReader<T> readerDelegate)
Create a
CompressedReader from a CompressedSource and delegate reader. |
Modifier and Type | Method and Description |
---|---|
boolean |
allowsDynamicSplitting()
Whether this reader should allow dynamic splitting of the offset ranges.
|
T |
getCurrent()
Gets the current record from the delegate reader.
|
protected long |
getCurrentOffset()
Returns the starting offset of the
current record ,
which has been read by the last successful Source.Reader.start() or
Source.Reader.advance() call. |
long |
getSplitPointsConsumed()
Returns the total amount of parallelism in the consumed (returned and processed) range of
this reader's current
BoundedSource (as would be returned by
BoundedSource.BoundedReader.getCurrentSource() ). |
long |
getSplitPointsRemaining()
Returns the total amount of parallelism in the unprocessed part of this reader's current
BoundedSource (as would be returned by BoundedSource.BoundedReader.getCurrentSource() ). |
protected boolean |
isAtSplitPoint()
Returns true only for the first record; compressed sources cannot be split.
|
protected boolean |
readNextRecord()
Reads the next record via the delegate reader.
|
protected void |
startReading(java.nio.channels.ReadableByteChannel channel)
Creates a decompressing channel from the input channel and passes it to its delegate reader's
FileBasedReader#startReading(ReadableByteChannel) . |
advanceImpl, close, getCurrentSource, startImpl
advance, getFractionConsumed, isDone, isStarted, splitAtFraction, start
getCurrentTimestamp
public CompressedReader(CompressedSource<T> source, FileBasedSource.FileBasedReader<T> readerDelegate)
CompressedReader
from a CompressedSource
and delegate reader.public T getCurrent() throws java.util.NoSuchElementException
getCurrent
in class Source.Reader<T>
java.util.NoSuchElementException
- if Source.Reader.start()
was never called, or if
the last Source.Reader.start()
or Source.Reader.advance()
returned false
.public boolean allowsDynamicSplitting()
OffsetBasedSource.OffsetBasedReader
True by default. Override this to return false if the reader cannot
support dynamic splitting correctly. If this returns false,
OffsetBasedSource.OffsetBasedReader.splitAtFraction(double)
will refuse all split requests.
allowsDynamicSplitting
in class FileBasedSource.FileBasedReader<T>
public final long getSplitPointsConsumed()
BoundedSource.BoundedReader
BoundedSource
(as would be returned by
BoundedSource.BoundedReader.getCurrentSource()
). This corresponds to all split point records (see
RangeTracker
) returned by this reader, excluding the last split point
returned if the reader is not finished.
Consider the following examples: (1) An input that can be read in parallel down to the
individual records, such as CountingSource.upTo(long)
, is called "perfectly splittable".
(2) a "block-compressed" file format such as AvroIO
, in which a block of records has
to be read as a whole, but different blocks can be read in parallel. (3) An "unsplittable"
input such as a cursor in a database.
reader
that is unstarted (aka, has never had a call to
Source.Reader.start()
) has a consumed parallelism of 0. This condition holds independent of whether
the input is splittable.
reader
that has only returned its first element (aka,
has never had a call to Source.Reader.advance()
) has a consumed parallelism of 0: the first element
is the current element and is still being processed. This condition holds independent of
whether the input is splittable.
Source.Reader.start()
returned false), the
consumed parallelism is 0. This condition holds independent of whether the input is
splittable.
Source.Reader.start()
returned true and
a call to Source.Reader.advance()
has returned false), the value returned must be at least 1
and should equal the total parallelism in the source.
Source.Reader.advance()
returned false, at
which point it becomes 1.
A reader that is implemented using a RangeTracker
is encouraged to use the
range tracker's ability to count split points to implement this method. See
OffsetBasedSource.OffsetBasedReader
and OffsetRangeTracker
for an example.
Defaults to BoundedSource.BoundedReader.SPLIT_POINTS_UNKNOWN
. Any value less than 0 will be interpreted
as unknown.
BoundedSource.BoundedReader
for information about thread safety.getSplitPointsConsumed
in class OffsetBasedSource.OffsetBasedReader<T>
BoundedSource.BoundedReader.getSplitPointsRemaining()
public final long getSplitPointsRemaining()
BoundedSource.BoundedReader
BoundedSource
(as would be returned by BoundedSource.BoundedReader.getCurrentSource()
). This corresponds
to all unprocessed split point records (see RangeTracker
), including the last
split point returned, in the remainder part of the source.
This function should be implemented only in addition to
BoundedSource.BoundedReader.getSplitPointsConsumed()
and only if an exact value can be
returned.
Consider the following examples: (1) An input that can be read in parallel down to the
individual records, such as CountingSource.upTo(long)
, is called "perfectly splittable".
(2) a "block-compressed" file format such as AvroIO
, in which a block of records has
to be read as a whole, but different blocks can be read in parallel. (3) An "unsplittable"
input such as a cursor in a database.
Assume for examples (1) and (2) that the number of records or blocks remaining is known:
reader
for which the last call to Source.Reader.start()
or
Source.Reader.advance()
has returned true should should not return 0, because this reader itself
represents parallelism at least 1. This condition holds independent of whether the input is
splittable.
Source.Reader.start()
or Source.Reader.advance()
) has returned false
should return a value of 0. This condition holds independent of whether the input is
splittable.
Defaults to BoundedSource.BoundedReader.SPLIT_POINTS_UNKNOWN
. Any value less than 0 will be interpreted as
unknown.
BoundedSource.BoundedReader
for information about thread safety.getSplitPointsRemaining
in class OffsetBasedSource.OffsetBasedReader<T>
BoundedSource.BoundedReader.getSplitPointsConsumed()
protected final boolean isAtSplitPoint()
isAtSplitPoint
in class OffsetBasedSource.OffsetBasedReader<T>
protected final void startReading(java.nio.channels.ReadableByteChannel channel) throws java.io.IOException
FileBasedReader#startReading(ReadableByteChannel)
.startReading
in class FileBasedSource.FileBasedReader<T>
channel
- a byte channel representing the file backing the reader.java.io.IOException
protected final boolean readNextRecord() throws java.io.IOException
readNextRecord
in class FileBasedSource.FileBasedReader<T>
true
if a record was successfully read, false
if the end of the
channel was reached before successfully reading a new record.java.io.IOException
protected final long getCurrentOffset() throws java.util.NoSuchElementException
OffsetBasedSource.OffsetBasedReader
current record
,
which has been read by the last successful Source.Reader.start()
or
Source.Reader.advance()
call.
If no such call has been made yet, the return value is unspecified.
See RangeTracker
for description of offset semantics.
getCurrentOffset
in class OffsetBasedSource.OffsetBasedReader<T>
java.util.NoSuchElementException