@Documented
@Retention(value=RUNTIME)
@Target(value=METHOD)
public static @interface DoFn.ProcessElement
DoFn
must have
a method with this annotation.
If any of the arguments is a RestrictionTracker
then see the specifications below
about splittable DoFn
, otherwise this method must satisfy the following constraints:
DoFn.Element
annotation, then it will be
passed the current element being processed. The argument type must match the input type
of this DoFn exactly, or both types must have equivalent schemas registered.
DoFn.Timestamp
annotation, then it will be
passed the timestamp of the current element being processed; the argument must be of type
Instant
.
BoundedWindow
, then it will be passed the
window of the current element. When applied by ParDo
the subtype of BoundedWindow
must match the type of windows on the input PCollection
. If the
window is not accessed a runner may perform additional optimizations.
PaneInfo
, then it will be passed information
about the current triggering pane.
PipelineOptions
, then it will be passed the
options for the current pipeline.
DoFn.OutputReceiver
, then it will be passed an
output receiver for outputting elements to the default output.
DoFn.MultiOutputReceiver
, then it will be passed
an output receiver for outputting to multiple tagged outputs.
DoFn.BundleFinalizer
, then it will be passed a
mechanism to register a callback that will be invoked after the runner successfully
commits the output of this bundle. See Apache Beam Portability API: How to
Finalize Bundles for further details.
void
.
A DoFn
is splittable if its DoFn.ProcessElement
method has a parameter
whose type is of RestrictionTracker
. This is an advanced feature and an overwhelming
majority of users will never need to write a splittable DoFn
.
Not all runners support Splittable DoFn. See the capability matrix.
See the proposal for an overview of the involved concepts (splittable DoFn, restriction, restriction tracker).
A splittable DoFn
must obey the following constraints:
DoFn.GetInitialRestriction
method.
DoFn.GetSize
method or ensure that the RestrictionTracker
implements RestrictionTracker.HasProgress
. Poor auto-scaling
of workers and/or splitting may result if size or progress is an inaccurate
representation of work. See DoFn.GetSize
and RestrictionTracker.HasProgress
for further details.
DoFn.SplitRestriction
method. This method enables runners to
perform bulk splitting initially allowing for a rapid increase in parallelism. If it is
not defined, there is no initial split happening by default. Note that initial split is a
different concept from the split during element processing time. See RestrictionTracker.trySplit(double)
for details about splitting when the current element and
restriction are actively being processed.
DoFn.TruncateRestriction
method to choose how to truncate a
restriction such that it represents a finite amount of work when the pipeline is
draining. See DoFn.TruncateRestriction
and RestrictionTracker.isBounded()
for
additional details.
DoFn.NewTracker
method returning a subtype of RestrictionTracker<R>
where R
is the restriction type returned by DoFn.GetInitialRestriction
. This method is optional only if the restriction type returned by
DoFn.GetInitialRestriction
implements HasDefaultTracker
.
DoFn.GetRestrictionCoder
method.
DoFn.GetInitialWatermarkEstimatorState
method. If none is
defined then the watermark estimator state is of type Void
.
DoFn.GetWatermarkEstimatorStateCoder
method.
DoFn.NewWatermarkEstimator
method returning a subtype of WatermarkEstimator<W>
where W
is the watermark estimator state type returned by
DoFn.GetInitialWatermarkEstimatorState
. This method is optional only if DoFn.GetInitialWatermarkEstimatorState
has not been defined or W
implements HasDefaultWatermarkEstimator
.
DoFn
itself may be annotated with DoFn.BoundedPerElement
or DoFn.UnboundedPerElement
, but not both at the same time. If it's not annotated with either of
these, it's assumed to be DoFn.BoundedPerElement
if its DoFn.ProcessElement
method
returns void
and DoFn.UnboundedPerElement
if it returns a DoFn.ProcessContinuation
.
If this DoFn is splittable, this method must satisfy the following constraints:
RestrictionTracker
. The argument must be of the
exact type RestrictionTracker<RestrictionT, PositionT>
.
DoFn.Element
annotation, then it will be
passed the current element being processed. The argument type must match the input type
of this DoFn exactly, or both types must have equivalent schemas registered.
DoFn.Restriction
annotation, then it will
be passed the current restriction being processed; the argument must be of type RestrictionT
.
DoFn.Timestamp
annotation, then it will be
passed the timestamp of the current element being processed; the argument must be of type
Instant
.
WatermarkEstimator
, then it will be passed
the watermark estimator.
ManualWatermarkEstimator
, then it will be
passed a watermark estimator that can be updated manually. This parameter can only be
supplied if the method annotated with DoFn.GetInitialWatermarkEstimatorState
returns a
sub-type of ManualWatermarkEstimator
.
BoundedWindow
, then it will be passed the
window of the current element. When applied by ParDo
the subtype of BoundedWindow
must match the type of windows on the input PCollection
. If the
window is not accessed a runner may perform additional optimizations.
PaneInfo
, then it will be passed information
about the current triggering pane.
PipelineOptions
, then it will be passed the
options for the current pipeline.
DoFn.OutputReceiver
, then it will be passed an
output receiver for outputting elements to the default output.
DoFn.MultiOutputReceiver
, then it will be passed
an output receiver for outputting to multiple tagged outputs.
DoFn.BundleFinalizer
, then it will be passed a
mechanism to register a callback that will be invoked after the runner successfully
commits the output of this bundle. See Apache Beam Portability API: How to
Finalize Bundles for further details.
DoFn.ProcessContinuation
to indicate whether there is more work to be
done for the current element, otherwise must return void
.