@Experimental(value=SPLITTABLE_DO_FN) public class GrowableOffsetRangeTracker extends OffsetRangeTracker
OffsetRangeTracker
for tracking a growable offset range. Long.MAX_VALUE
is
used as the end of the range to indicate infinity.
An offset range is considered growable when the end offset could grow (or change) during execution time (e.g., Kafka topic partition offset, appended file, ...).
The growable range is marked as done by claiming Long.MAX_VALUE
.
Modifier and Type | Class and Description |
---|---|
static interface |
GrowableOffsetRangeTracker.RangeEndEstimator
Provides the estimated end offset of the range.
|
RestrictionTracker.HasProgress, RestrictionTracker.Progress
lastAttemptedOffset, lastClaimedOffset, range
Constructor and Description |
---|
GrowableOffsetRangeTracker(long start,
GrowableOffsetRangeTracker.RangeEndEstimator rangeEndEstimator) |
Modifier and Type | Method and Description |
---|---|
RestrictionTracker.Progress |
getProgress()
A representation for the amount of known completed and known remaining work.
|
SplitResult<OffsetRange> |
trySplit(double fractionOfRemainder)
Splits current restriction based on
fractionOfRemainder . |
checkDone, currentRestriction, toString, tryClaim
public GrowableOffsetRangeTracker(long start, GrowableOffsetRangeTracker.RangeEndEstimator rangeEndEstimator)
public SplitResult<OffsetRange> trySplit(double fractionOfRemainder)
RestrictionTracker
fractionOfRemainder
.
If splitting the current restriction is possible, the current restriction is split into a
primary and residual restriction pair. This invocation updates the RestrictionTracker.currentRestriction()
to be the primary restriction effectively having the current DoFn.ProcessElement
execution responsible for performing the work that the primary restriction
represents. The residual restriction will be executed in a separate DoFn.ProcessElement
invocation (likely in a different process). The work performed by executing the primary and
residual restrictions as separate DoFn.ProcessElement
invocations MUST be equivalent to
the work performed as if this split never occurred.
The fractionOfRemainder
should be used in a best effort manner to choose a primary
and residual restriction based upon the fraction of the remaining work that the current DoFn.ProcessElement
invocation is responsible for. For example, if a DoFn.ProcessElement
was reading a file with a restriction representing the offset range [100, 200)
and has processed up to offset 130 with a fractionOfRemainder
of 0.7
, the primary and residual restrictions returned would be [100, 179), [179, 200)
(note: currentOffset + fractionOfRemainder * remainingWork = 130 + 0.7 * 70 = 179
).
fractionOfRemainder = 0
means a checkpoint is required.
The API is recommended to be implemented for a batch pipeline to improve parallel processing performance.
The API is required to be implemented for a streaming pipeline.
trySplit
in class OffsetRangeTracker
fractionOfRemainder
- A hint as to the fraction of work the primary restriction should
represent based upon the current known remaining amount of work.SplitResult
if a split was possible, otherwise returns null
. If the
fractionOfRemainder == 0
, a null
result MUST imply that the restriction
tracker is done and there is no more work left to do.public RestrictionTracker.Progress getProgress()
RestrictionTracker.HasProgress
It is up to each restriction tracker to convert between their natural representation of
completed and remaining work and the double
representation. For example:
message bytes
that have processed and the number of
messages or number of message bytes
that are outstanding.
The work completed and work remaining must be of the same scale whether that be number of messages or number of bytes and should never represent two distinct unit types.
getProgress
in interface RestrictionTracker.HasProgress
getProgress
in class OffsetRangeTracker