Class OffsetRangeTracker

java.lang.Object
org.apache.beam.sdk.io.range.OffsetRangeTracker
All Implemented Interfaces:
RangeTracker<Long>

public class OffsetRangeTracker extends Object implements RangeTracker<Long>
A RangeTracker for non-negative positions of type long.

Not to be confused with OffsetRangeTracker.

  • Field Details

    • OFFSET_INFINITY

      public static final long OFFSET_INFINITY
      Offset corresponding to infinity. This can only be used as the upper-bound of a range, and indicates reading all of the records until the end without specifying exactly what the end is.

      Infinite ranges cannot be split because it is impossible to estimate progress within them.

      See Also:
  • Constructor Details

    • OffsetRangeTracker

      public OffsetRangeTracker(long startOffset, long stopOffset)
      Creates an OffsetRangeTracker for the specified range.
  • Method Details

    • isStarted

      public boolean isStarted()
    • isDone

      public boolean isDone()
    • getStartPosition

      public Long getStartPosition()
      Description copied from interface: RangeTracker
      Returns the starting position of the current range, inclusive.
      Specified by:
      getStartPosition in interface RangeTracker<Long>
    • getStopPosition

      public Long getStopPosition()
      Description copied from interface: RangeTracker
      Returns the ending position of the current range, exclusive.
      Specified by:
      getStopPosition in interface RangeTracker<Long>
    • tryReturnRecordAt

      public boolean tryReturnRecordAt(boolean isAtSplitPoint, Long recordStart)
      Description copied from interface: RangeTracker
      Atomically determines whether a record at the given position can be returned and updates internal state. In particular:
      • If isAtSplitPoint is true, and recordStart is outside the current range, returns false;
      • Otherwise, updates the last-consumed position to recordStart and returns true.

      This method MUST be called on all split point records. It may be called on every record.

      Specified by:
      tryReturnRecordAt in interface RangeTracker<Long>
    • tryReturnRecordAt

      public boolean tryReturnRecordAt(boolean isAtSplitPoint, long recordStart)
    • trySplitAtPosition

      public boolean trySplitAtPosition(Long splitOffset)
      Description copied from interface: RangeTracker
      Atomically splits the current range [RangeTracker.getStartPosition(), RangeTracker.getStopPosition()) into a "primary" part [RangeTracker.getStartPosition(), splitPosition) and a "residual" part [splitPosition, RangeTracker.getStopPosition()), assuming the current last-consumed position is within [RangeTracker.getStartPosition(), splitPosition) (i.e., splitPosition has not been consumed yet).

      Updates the current range to be the primary and returns true. This means that all further calls on the current object will interpret their arguments relative to the primary range.

      If the split position has already been consumed, or if no RangeTracker.tryReturnRecordAt(boolean, PositionT) call was made yet, returns false. The second condition is to prevent dynamic splitting during reader start-up.

      Specified by:
      trySplitAtPosition in interface RangeTracker<Long>
    • trySplitAtPosition

      public boolean trySplitAtPosition(long splitOffset)
    • getPositionForFractionConsumed

      public long getPositionForFractionConsumed(double fraction)
      Returns a position P such that the range [start, P) represents approximately the given fraction of the range [start, end). Assumes that the density of records in the range is approximately uniform.
    • getFractionConsumed

      public double getFractionConsumed()
      Description copied from interface: RangeTracker
      Returns the approximate fraction of positions in the source that have been consumed by successful RangeTracker.tryReturnRecordAt(boolean, PositionT) calls, or 0.0 if no such calls have happened.
      Specified by:
      getFractionConsumed in interface RangeTracker<Long>
    • getSplitPointsProcessed

      public long getSplitPointsProcessed()
      Returns the total number of split points that have been processed.

      A split point at a particular offset has been seen if there has been a corresponding call to tryReturnRecordAt(boolean, long) with isAtSplitPoint true. It has been processed if there has been a subsequent call to tryReturnRecordAt(boolean, long) with isAtSplitPoint true and at a larger offset.

      Note that for correctness when implementing BoundedSource.BoundedReader.getSplitPointsConsumed(), if a reader finishes before tryReturnRecordAt(boolean, long) returns false, the reader should add an additional call to markDone(). This will indicate that processing for the last seen split point has been finished.

      See Also:
    • markDone

      public boolean markDone()
      Marks this range tracker as being done. Specifically, this will mark the current split point, if one exists, as being finished.

      Always returns false, so that it can be used in an implementation of Source.Reader.start() or Source.Reader.advance() as follows:

      
       public boolean start() {
         return startImpl() && rangeTracker.tryReturnRecordAt(isAtSplitPoint, position)
             || rangeTracker.markDone();
       }
       
    • toString

      public String toString()
      Overrides:
      toString in class Object