@Experimental(value=SPLITTABLE_DO_FN) public class ByteKeyRangeTracker extends RestrictionTracker<ByteKeyRange,ByteKey> implements Sizes.HasSize
RestrictionTracker
for claiming ByteKey
s in a ByteKeyRange
in a
monotonically increasing fashion. The range is a semi-open bounded interval [startKey, endKey)
where the limits are both represented by ByteKey.EMPTY
.
Note, one can complete a range by claiming the ByteKey.EMPTY
once one runs out of keys
to process.
Modifier and Type | Method and Description |
---|---|
void |
checkDone()
Called by the runner after
DoFn.ProcessElement returns. |
ByteKeyRange |
currentRestriction()
Returns a restriction accurately describing the full range of work the current
DoFn.ProcessElement call will do, including already completed work. |
double |
getSize()
A representation for the amount of known work represented as a size.
|
static ByteKeyRangeTracker |
of(ByteKeyRange range) |
java.lang.String |
toString() |
boolean |
tryClaim(ByteKey key)
Attempts to claim the given key.
|
SplitResult<ByteKeyRange> |
trySplit(double fractionOfRemainder)
Splits current restriction based on
fractionOfRemainder . |
public static ByteKeyRangeTracker of(ByteKeyRange range)
public ByteKeyRange currentRestriction()
RestrictionTracker
DoFn.ProcessElement
call will do, including already completed work.currentRestriction
in class RestrictionTracker<ByteKeyRange,ByteKey>
public SplitResult<ByteKeyRange> trySplit(double fractionOfRemainder)
RestrictionTracker
fractionOfRemainder
.
If splitting the current restriction is possible, the current restriction is split into a
primary and residual restriction pair. This invocation updates the RestrictionTracker.currentRestriction()
to be the primary restriction effectively having the current DoFn.ProcessElement
execution responsible for performing the work that the primary restriction
represents. The residual restriction will be executed in a separate DoFn.ProcessElement
invocation (likely in a different process). The work performed by executing the primary and
residual restrictions as separate DoFn.ProcessElement
invocations MUST be equivalent to
the work performed as if this split never occurred.
The fractionOfRemainder
should be used in a best effort manner to choose a primary
and residual restriction based upon the fraction of the remaining work that the current DoFn.ProcessElement
invocation is responsible for. For example, if a DoFn.ProcessElement
was reading a file with a restriction representing the offset range [100, 200)
and has processed up to offset 130 with a fractionOfRemainder
of 0.7
, the primary and residual restrictions returned would be [100, 179), [179, 200)
(note: currentOffset + fractionOfRemainder * remainingWork = 130 + 0.7 * 70 = 179
).
fractionOfRemainder = 0
means a checkpoint is required.
The API is recommended to be implemented for a batch pipeline to improve parallel processing performance.
The API is required to be implemented for a streaming pipeline.
trySplit
in class RestrictionTracker<ByteKeyRange,ByteKey>
fractionOfRemainder
- A hint as to the fraction of work the primary restriction should
represent based upon the current known remaining amount of work.SplitResult
if a split was possible, otherwise returns null
.public boolean tryClaim(ByteKey key)
Must be larger than the last attempted key. Since this restriction tracker represents a
range over a semi-open bounded interval [start, end)
, the last key that was attempted
may have failed but still have consumed the interval [lastAttemptedKey, end)
since this
range tracker processes keys in a monotonically increasing order. Note that passing in ByteKey.EMPTY
claims all keys to the end of range and can only be claimed once.
tryClaim
in class RestrictionTracker<ByteKeyRange,ByteKey>
true
if the key was successfully claimed, false
if it is outside the
current ByteKeyRange
of this tracker.public void checkDone() throws java.lang.IllegalStateException
RestrictionTracker
DoFn.ProcessElement
returns.
Must throw an exception with an informative error message, if there is still any unclaimed work remaining in the restriction.
checkDone
in class RestrictionTracker<ByteKeyRange,ByteKey>
java.lang.IllegalStateException
public java.lang.String toString()
toString
in class java.lang.Object
public double getSize()
Sizes.HasSize
double
representations should preferably represent a linear space.
It is up to each restriction tracker to convert between their natural representation of outstanding work and this representation. For example:
message bytes
that have not been processed.
getSize
in interface Sizes.HasSize