Class BlockBasedSource.BlockBasedReader<T>
- All Implemented Interfaces:
AutoCloseable
- Direct Known Subclasses:
AvroSource.AvroReader
- Enclosing class:
BlockBasedSource<T>
Reader
that reads records from a BlockBasedSource
. If the source is a
subrange of a file, the blocks that will be read by this reader are those such that the first
byte of the block is within the range [start, end)
.-
Field Summary
Fields inherited from class org.apache.beam.sdk.io.BoundedSource.BoundedReader
SPLIT_POINTS_UNKNOWN
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfinal T
Returns the value of the data item that was read by the lastSource.Reader.start()
orSource.Reader.advance()
call.abstract @Nullable BlockBasedSource.Block
<T> Returns the current block (the block that was read by the last successful call toreadNextBlock()
).abstract long
Returns the largest offset such that starting to read from that offset includes the current block.abstract long
Returns the size of the current block in bytes as it is represented in the underlying file, if possible.protected long
Returns the starting offset of thecurrent record
, which has been read by the last successfulSource.Reader.start()
orSource.Reader.advance()
call.Returns a value in [0, 1] representing approximately what fraction of thecurrent source
this reader has read so far, ornull
if such an estimate is not available.boolean
Returns true if the reader is at a split point.abstract boolean
Read the next block from the input.protected final boolean
Reads the next record from thecurrent block
if possible.Methods inherited from class org.apache.beam.sdk.io.FileBasedSource.FileBasedReader
advanceImpl, allowsDynamicSplitting, close, getCurrentSource, startImpl, startReading
Methods inherited from class org.apache.beam.sdk.io.OffsetBasedSource.OffsetBasedReader
advance, getSplitPointsConsumed, getSplitPointsRemaining, isDone, isStarted, splitAtFraction, start
Methods inherited from class org.apache.beam.sdk.io.BoundedSource.BoundedReader
getCurrentTimestamp
-
Constructor Details
-
BlockBasedReader
-
-
Method Details
-
readNextBlock
Read the next block from the input.- Throws:
IOException
-
getCurrentBlock
Returns the current block (the block that was read by the last successful call toreadNextBlock()
). May return null initially, or if no block has been successfully read. -
getCurrentBlockSize
public abstract long getCurrentBlockSize()Returns the size of the current block in bytes as it is represented in the underlying file, if possible. This method may return0
if the size of the current block is unknown.The size returned by this method must be such that for two successive blocks A and B,
offset(A) + size(A) <= offset(B)
. If this is not satisfied, the progress reported by theBlockBasedReader
will be non-monotonic and will interfere with the quality (but not correctness) of dynamic work rebalancing.This method and
BlockBasedSource.Block.getFractionOfBlockConsumed()
are used to provide an estimate of progress within a block (getCurrentBlock().getFractionOfBlockConsumed() * getCurrentBlockSize()
). It is acceptable for the result of this computation to be0
, but progress estimation will be inaccurate. -
getCurrentBlockOffset
public abstract long getCurrentBlockOffset()Returns the largest offset such that starting to read from that offset includes the current block. -
getCurrent
Description copied from class:Source.Reader
Returns the value of the data item that was read by the lastSource.Reader.start()
orSource.Reader.advance()
call. The returned value must be effectively immutable and remain valid indefinitely.Multiple calls to this method without an intervening call to
Source.Reader.advance()
should return the same result.- Specified by:
getCurrent
in classSource.Reader<T>
- Throws:
NoSuchElementException
- ifSource.Reader.start()
was never called, or if the lastSource.Reader.start()
orSource.Reader.advance()
returnedfalse
.
-
isAtSplitPoint
public boolean isAtSplitPoint()Returns true if the reader is at a split point. ABlockBasedReader
is at a split point if the current record is the first record in a block. In other words, split points are block boundaries.- Overrides:
isAtSplitPoint
in classOffsetBasedSource.OffsetBasedReader<T>
-
readNextRecord
Reads the next record from thecurrent block
if possible. Will callreadNextBlock()
to advance to the next block if not.The first record read from a block is treated as a split point.
- Specified by:
readNextRecord
in classFileBasedSource.FileBasedReader<T>
- Returns:
true
if a record was successfully read,false
if the end of the channel was reached before successfully reading a new record.- Throws:
IOException
-
getFractionConsumed
Description copied from class:BoundedSource.BoundedReader
Returns a value in [0, 1] representing approximately what fraction of thecurrent source
this reader has read so far, ornull
if such an estimate is not available.It is recommended that this method should satisfy the following properties:
- Should return 0 before the
Source.Reader.start()
call. - Should return 1 after a
Source.Reader.start()
orSource.Reader.advance()
call that returns false. - The returned values should be non-decreasing (though they don't have to be unique).
By default, returns null to indicate that this cannot be estimated.
Thread safety
IfBoundedSource.BoundedReader.splitAtFraction(double)
is implemented, this method can be called concurrently to other methods (including itself), and it is therefore critical for it to be implemented in a thread-safe way.- Overrides:
getFractionConsumed
in classOffsetBasedSource.OffsetBasedReader<T>
- Should return 0 before the
-
getCurrentOffset
protected long getCurrentOffset()Description copied from class:OffsetBasedSource.OffsetBasedReader
Returns the starting offset of thecurrent record
, which has been read by the last successfulSource.Reader.start()
orSource.Reader.advance()
call.If no such call has been made yet, the return value is unspecified.
See
RangeTracker
for description of offset semantics.- Specified by:
getCurrentOffset
in classOffsetBasedSource.OffsetBasedReader<T>
-