public abstract class RestrictionTracker<RestrictionT,PositionT>
extends java.lang.Object
DoFn
.
The restriction may be modified by different threads, however the system will ensure sufficient locking such that no methods on the restriction tracker will be called concurrently.
RestrictionTracker
s should implement RestrictionTracker.HasProgress
otherwise poor auto-scaling
of workers and/or splitting may result if the progress is an inaccurate representation of the
known amount of completed and remaining work.
Modifier and Type | Class and Description |
---|---|
static interface |
RestrictionTracker.HasProgress
All
RestrictionTracker s SHOULD implement this interface to improve auto-scaling and
splitting performance. |
static class |
RestrictionTracker.IsBounded |
static class |
RestrictionTracker.Progress
A representation for the amount of known completed and remaining work.
|
static class |
RestrictionTracker.TruncateResult<RestrictionT>
A representation of the truncate result.
|
Constructor and Description |
---|
RestrictionTracker() |
Modifier and Type | Method and Description |
---|---|
abstract void |
checkDone()
Checks whether the restriction has been fully processed.
|
abstract RestrictionT |
currentRestriction()
Returns a restriction accurately describing the full range of work the current
DoFn.ProcessElement call will do, including already completed work. |
abstract RestrictionTracker.IsBounded |
isBounded()
Return the boundedness of the current restriction.
|
abstract boolean |
tryClaim(PositionT position)
Attempts to claim the block of work in the current restriction identified by the given
position.
|
abstract @Nullable SplitResult<RestrictionT> |
trySplit(double fractionOfRemainder)
Splits current restriction based on
fractionOfRemainder . |
public abstract boolean tryClaim(PositionT position)
If this succeeds, the DoFn MUST execute the entire block of work. If this fails:
DoFn.ProcessElement
MUST return DoFn.ProcessContinuation#stop
without
performing any additional work or emitting output (note that emitting output or
performing work from DoFn.ProcessElement
is also not allowed before the first
call to this method).
checkDone()
MUST succeed.
public abstract RestrictionT currentRestriction()
DoFn.ProcessElement
call will do, including already completed work.
The current restriction returned by method may be updated dynamically due to due to
concurrent invocation of other methods of the RestrictionTracker
, For example, trySplit(double)
.
This method is required to be implemented.
public abstract @Nullable SplitResult<RestrictionT> trySplit(double fractionOfRemainder)
fractionOfRemainder
.
If splitting the current restriction is possible, the current restriction is split into a
primary and residual restriction pair. This invocation updates the currentRestriction()
to be the primary restriction effectively having the current DoFn.ProcessElement
execution responsible for performing the work that the primary restriction
represents. The residual restriction will be executed in a separate DoFn.ProcessElement
invocation (likely in a different process). The work performed by executing the primary and
residual restrictions as separate DoFn.ProcessElement
invocations MUST be equivalent to
the work performed as if this split never occurred.
The fractionOfRemainder
should be used in a best effort manner to choose a primary
and residual restriction based upon the fraction of the remaining work that the current DoFn.ProcessElement
invocation is responsible for. For example, if a DoFn.ProcessElement
was reading a file with a restriction representing the offset range [100, 200)
and has processed up to offset 130 with a fractionOfRemainder
of 0.7
, the primary and residual restrictions returned would be [100, 179), [179, 200)
(note: currentOffset + fractionOfRemainder * remainingWork = 130 + 0.7 * 70 = 179
).
fractionOfRemainder = 0
means a checkpoint is required.
The API is recommended to be implemented for a batch pipeline to improve parallel processing performance.
The API is recommended to be implemented for batch pipeline given that it is very important for pipeline scaling and end to end pipeline execution.
The API is required to be implemented for a streaming pipeline.
fractionOfRemainder
- A hint as to the fraction of work the primary restriction should
represent based upon the current known remaining amount of work.SplitResult
if a split was possible, otherwise returns null
. If the
fractionOfRemainder == 0
, a null
result MUST imply that the restriction
tracker is done and there is no more work left to do.public abstract void checkDone() throws java.lang.IllegalStateException
Called by the SDK harness after DoFn.ProcessElement
returns.
Must throw an exception with an informative error message, if there is still any unclaimed work remaining in the restriction.
This method is required to be implemented in order to prevent data loss during SDK processing.
java.lang.IllegalStateException
public abstract RestrictionTracker.IsBounded isBounded()
RestrictionTracker.IsBounded.BOUNDED
. Otherwise, it should return
RestrictionTracker.IsBounded.UNBOUNDED
.
It is valid to return RestrictionTracker.IsBounded.BOUNDED
after returning RestrictionTracker.IsBounded.UNBOUNDED
once the end of a restriction is discovered. It is not valid to return RestrictionTracker.IsBounded.UNBOUNDED
after returning RestrictionTracker.IsBounded.BOUNDED
.
This method is required to be implemented.