public static class BigQueryStorageStreamSource.BigQueryStorageStreamReader<T> extends BoundedSource.BoundedReader<T>
Source.Reader
which reads records from a stream.SPLIT_POINTS_UNKNOWN
Modifier and Type | Method and Description |
---|---|
boolean |
advance()
Advances the reader to the next valid record.
|
void |
close()
Closes the reader.
|
T |
getCurrent()
Returns the value of the data item that was read by the last
Source.Reader.start() or Source.Reader.advance() call. |
BigQueryStorageStreamSource<T> |
getCurrentSource()
Returns a
Source describing the same input that this Reader currently reads
(including items already read). |
java.lang.Double |
getFractionConsumed()
Returns a value in [0, 1] representing approximately what fraction of the
current source this reader has read so far, or null if such an
estimate is not available. |
BoundedSource<T> |
splitAtFraction(double fraction)
Tells the reader to narrow the range of the input it's going to read and give up the
remainder, so that the new range would contain approximately the given fraction of the amount
of data in the current range.
|
boolean |
start()
Initializes the reader and advances the reader to the first record.
|
getCurrentTimestamp, getSplitPointsConsumed, getSplitPointsRemaining
public boolean start() throws java.io.IOException
Source.Reader
This method should be called exactly once. The invocation should occur prior to calling
Source.Reader.advance()
or Source.Reader.getCurrent()
. This method may perform expensive operations that
are needed to initialize the reader.
start
in class Source.Reader<T>
true
if a record was read, false
if there is no more input available.java.io.IOException
public boolean advance() throws java.io.IOException
Source.Reader
It is an error to call this without having called Source.Reader.start()
first.
advance
in class Source.Reader<T>
true
if a record was read, false
if there is no more input available.java.io.IOException
public T getCurrent() throws java.util.NoSuchElementException
Source.Reader
Source.Reader.start()
or Source.Reader.advance()
call. The returned value must be effectively immutable and remain valid
indefinitely.
Multiple calls to this method without an intervening call to Source.Reader.advance()
should
return the same result.
getCurrent
in class Source.Reader<T>
java.util.NoSuchElementException
- if Source.Reader.start()
was never called, or if the last
Source.Reader.start()
or Source.Reader.advance()
returned false
.public void close()
Source.Reader
close
in interface java.lang.AutoCloseable
close
in class Source.Reader<T>
public BigQueryStorageStreamSource<T> getCurrentSource()
BoundedSource.BoundedReader
Source
describing the same input that this Reader
currently reads
(including items already read).
Reader subclasses can use this method for convenience to access unchanging properties of the source being read. Alternatively, they can cache these properties in the constructor.
The framework will call this method in the course of dynamic work rebalancing, e.g. after
a successful BoundedSource.BoundedReader.splitAtFraction(double)
call.
Remember that Source
objects must always be immutable. However, the return value
of this function may be affected by dynamic work rebalancing, happening asynchronously via
BoundedSource.BoundedReader.splitAtFraction(double)
, meaning it can return a different Source
object. However, the returned object itself will still itself be immutable. Callers
must take care not to rely on properties of the returned source that may be asynchronously
changed as a result of this process (e.g. do not cache an end offset when reading a file).
For convenience, subclasses should usually return the most concrete subclass of Source
possible. In practice, the implementation of this method should nearly always be one
of the following:
BoundedSource.BoundedReader.getCurrentSource()
: delegate to base class. In this case, it is almost always an error
for the subclass to maintain its own copy of the source.
public FooReader(FooSource<T> source) {
super(source);
}
public FooSource<T> getCurrentSource() {
return (FooSource<T>)super.getCurrentSource();
}
private final FooSource<T> source;
public FooReader(FooSource<T> source) {
this.source = source;
}
public FooSource<T> getCurrentSource() {
return source;
}
BoundedSource.BoundedReader
that explicitly supports dynamic work rebalancing:
maintain a variable pointing to an immutable source object, and protect it with
synchronization.
private FooSource<T> source;
public FooReader(FooSource<T> source) {
this.source = source;
}
public synchronized FooSource<T> getCurrentSource() {
return source;
}
public synchronized FooSource<T> splitAtFraction(double fraction) {
...
FooSource<T> primary = ...;
FooSource<T> residual = ...;
this.source = primary;
return residual;
}
getCurrentSource
in class BoundedSource.BoundedReader<T>
public BoundedSource<T> splitAtFraction(double fraction)
BoundedSource.BoundedReader
Returns a BoundedSource
representing the remainder.
BoundedSource<T> initial = reader.getCurrentSource();
BoundedSource<T> residual = reader.splitAtFraction(fraction);
BoundedSource<T> primary = reader.getCurrentSource();
This method should return null
if the split cannot be performed for this fraction
while satisfying the semantics above. E.g., a reader that reads a range of offsets in a file
should return null
if it is already past the position in its range corresponding to
the given fraction. In this case, the method MUST have no effect (the reader must behave as
if the method hadn't been called at all).
It is also very important that this method always completes quickly. In particular, it should not perform or wait on any blocking operations such as I/O, RPCs etc. Violating this requirement may stall completion of the work item or even cause it to fail.
It is incorrect to make both this method and Source.Reader.start()
/Source.Reader.advance()
synchronized
, because those methods can perform blocking operations, and then this method
would have to wait for those calls to complete.
RangeTracker
makes it easy to implement this method
safely and correctly.
By default, returns null to indicate that splitting is not possible.
splitAtFraction
in class BoundedSource.BoundedReader<T>
public java.lang.Double getFractionConsumed()
BoundedSource.BoundedReader
current source
this reader has read so far, or null
if such an
estimate is not available.
It is recommended that this method should satisfy the following properties:
Source.Reader.start()
call.
Source.Reader.start()
or Source.Reader.advance()
call that returns false.
By default, returns null to indicate that this cannot be estimated.
BoundedSource.BoundedReader.splitAtFraction(double)
is implemented, this method can be called concurrently to other
methods (including itself), and it is therefore critical for it to be implemented in a
thread-safe way.getFractionConsumed
in class BoundedSource.BoundedReader<T>