Class UnboundedReaderImpl
- All Implemented Interfaces:
AutoCloseable
-
Field Summary
Fields inherited from class org.apache.beam.sdk.io.UnboundedSource.UnboundedReader
BACKLOG_UNKNOWN
-
Method Summary
Modifier and TypeMethodDescriptionboolean
advance()
Advances the reader to the next valid record.void
close()
Closes the reader.Returns aUnboundedSource.CheckpointMark
representing the progress of thisUnboundedReader
.com.google.cloud.pubsublite.proto.SequencedMessage
Returns the value of the data item that was read by the lastSource.Reader.start()
orSource.Reader.advance()
call.UnboundedSource
<com.google.cloud.pubsublite.proto.SequencedMessage, CheckpointMarkImpl> Returns theUnboundedSource
that created this reader.Returns the timestamp associated with the current data item.long
Returns the size of the backlog of unread data in the underlying data source represented by this split of this source.Returns a timestamp before or at the timestamps of all future elements read by this reader.boolean
start()
Initializes the reader and advances the reader to the first record.Methods inherited from class org.apache.beam.sdk.io.UnboundedSource.UnboundedReader
getCurrentRecordId, getCurrentRecordOffset, getTotalBacklogBytes
-
Method Details
-
getCurrent
public com.google.cloud.pubsublite.proto.SequencedMessage getCurrent() throws NoSuchElementExceptionDescription copied from class:Source.Reader
Returns the value of the data item that was read by the lastSource.Reader.start()
orSource.Reader.advance()
call. The returned value must be effectively immutable and remain valid indefinitely.Multiple calls to this method without an intervening call to
Source.Reader.advance()
should return the same result.- Specified by:
getCurrent
in classSource.Reader<com.google.cloud.pubsublite.proto.SequencedMessage>
- Throws:
NoSuchElementException
- ifSource.Reader.start()
was never called, or if the lastSource.Reader.start()
orSource.Reader.advance()
returnedfalse
.
-
getCurrentTimestamp
Description copied from class:Source.Reader
Returns the timestamp associated with the current data item.If the source does not support timestamps, this should return
BoundedWindow.TIMESTAMP_MIN_VALUE
.Multiple calls to this method without an intervening call to
Source.Reader.advance()
should return the same result.- Specified by:
getCurrentTimestamp
in classSource.Reader<com.google.cloud.pubsublite.proto.SequencedMessage>
- Throws:
NoSuchElementException
- if the reader is at the beginning of the input andSource.Reader.start()
orSource.Reader.advance()
wasn't called, or if the lastSource.Reader.start()
orSource.Reader.advance()
returnedfalse
.
-
close
Description copied from class:Source.Reader
Closes the reader. The reader cannot be used after this method is called.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in classSource.Reader<com.google.cloud.pubsublite.proto.SequencedMessage>
- Throws:
IOException
-
start
Description copied from class:UnboundedSource.UnboundedReader
Initializes the reader and advances the reader to the first record. If the reader has been restored from a checkpoint then it should advance to the next unread record at the point the checkpoint was taken.This method will be called exactly once. The invocation will occur prior to calling
UnboundedSource.UnboundedReader.advance()
orSource.Reader.getCurrent()
. This method may perform expensive operations that are needed to initialize the reader.Returns
true
if a record was read,false
if there is no more input currently available. Future calls toUnboundedSource.UnboundedReader.advance()
may returntrue
once more data is available. Regardless of the return value ofstart
,start
will not be called again on the sameUnboundedReader
object; it will only be called again when a new reader object is constructed for the same source, e.g. on recovery.- Specified by:
start
in classUnboundedSource.UnboundedReader<com.google.cloud.pubsublite.proto.SequencedMessage>
- Returns:
true
if a record was read,false
if there is no more input available.- Throws:
IOException
-
advance
Description copied from class:UnboundedSource.UnboundedReader
Advances the reader to the next valid record.Returns
true
if a record was read,false
if there is no more input available. Future calls toUnboundedSource.UnboundedReader.advance()
may returntrue
once more data is available.- Specified by:
advance
in classUnboundedSource.UnboundedReader<com.google.cloud.pubsublite.proto.SequencedMessage>
- Returns:
true
if a record was read,false
if there is no more input available.- Throws:
IOException
-
getWatermark
Description copied from class:UnboundedSource.UnboundedReader
Returns a timestamp before or at the timestamps of all future elements read by this reader.This can be approximate. If records are read that violate this guarantee, they will be considered late, which will affect how they will be processed. See
Window
for more information on late data and how to handle it.However, this value should be as late as possible. Downstream windows may not be able to close until this watermark passes their end.
For example, a source may know that the records it reads will be in timestamp order. In this case, the watermark can be the timestamp of the last record read. For a source that does not have natural timestamps, timestamps can be set to the time of reading, in which case the watermark is the current clock time.
See
Window
andTrigger
for more information on timestamps and watermarks.May be called after
UnboundedSource.UnboundedReader.advance()
orUnboundedSource.UnboundedReader.start()
has returned false, but not beforeUnboundedSource.UnboundedReader.start()
has been called.- Specified by:
getWatermark
in classUnboundedSource.UnboundedReader<com.google.cloud.pubsublite.proto.SequencedMessage>
-
getCheckpointMark
Description copied from class:UnboundedSource.UnboundedReader
Returns aUnboundedSource.CheckpointMark
representing the progress of thisUnboundedReader
.If this
UnboundedReader
does not support checkpoints, it may return a CheckpointMark which does nothing, like:public UnboundedSource.CheckpointMark getCheckpointMark() { return new UnboundedSource.CheckpointMark() { public void finalizeCheckpoint() throws IOException { // nothing to do } }; }
All elements read between the last time this method was called (or since this reader was created, if this method has not been called on this reader) until this method is called will be processed together as a bundle. (An element is considered 'read' if it could be returned by a call to
Source.Reader.getCurrent()
.)Once the result of processing those elements and the returned checkpoint have been durably committed,
UnboundedSource.CheckpointMark.finalizeCheckpoint()
will be called at most once at some later point on the returnedUnboundedSource.CheckpointMark
object. Checkpoint finalization is best-effort, and checkpoints may not be finalized. If duplicate elements may be produced if checkpoints are not finalized in a timely manner,UnboundedSource.requiresDeduping()
should be overridden to return true, andUnboundedSource.UnboundedReader.getCurrentRecordId()
should be overridden to return unique record IDs.A checkpoint will be committed to durable storage only if all all previous checkpoints produced by the same reader have also been committed.
The returned object should not be modified.
May not be called before
UnboundedSource.UnboundedReader.start()
has been called.- Specified by:
getCheckpointMark
in classUnboundedSource.UnboundedReader<com.google.cloud.pubsublite.proto.SequencedMessage>
-
getCurrentSource
public UnboundedSource<com.google.cloud.pubsublite.proto.SequencedMessage,CheckpointMarkImpl> getCurrentSource()Description copied from class:UnboundedSource.UnboundedReader
Returns theUnboundedSource
that created this reader. This will not change over the life of the reader.- Specified by:
getCurrentSource
in classUnboundedSource.UnboundedReader<com.google.cloud.pubsublite.proto.SequencedMessage>
-
getSplitBacklogBytes
public long getSplitBacklogBytes()Description copied from class:UnboundedSource.UnboundedReader
Returns the size of the backlog of unread data in the underlying data source represented by this split of this source.One of this or
UnboundedSource.UnboundedReader.getTotalBacklogBytes()
should be overridden in order to allow the runner to scale the amount of resources allocated to the pipeline.- Overrides:
getSplitBacklogBytes
in classUnboundedSource.UnboundedReader<com.google.cloud.pubsublite.proto.SequencedMessage>
-