Class DoFn<InputT extends @Nullable Object,OutputT extends @Nullable Object>
- Type Parameters:
InputT
- the type of the (main) input elementsOutputT
- the type of the (main) output elements
- All Implemented Interfaces:
Serializable
,HasDisplayData
- Direct Known Subclasses:
AddShardKeyDoFn
,BatchStatefulParDoOverrides.BatchStatefulDoFn
,BeamSetOperatorsTransforms.SetOperatorFilteringDoFn
,BeamSqlOutputToConsoleFn
,CleanUpReadChangeStreamDoFn
,DataflowRunner.StreamingPCollectionViewWriterFn
,DataGeneratorRowFn
,DetectNewPartitionsDoFn
,DetectNewPartitionsDoFn
,ErrorHandler.PTransformErrorHandler.WriteErrorMetrics.CountErrors
,FhirIO.Deidentify.DeidentifyFn
,FhirIO.Export.ExportResourcesFn
,FillGaps.FillGapsDoFn
,FilterForMutationDoFn
,HL7v2IO.HL7v2Read.FetchHL7v2Message.HL7v2MessageGetFn
,HL7v2IO.Read.FetchHL7v2Message.HL7v2MessageGetFn
,InitializeDoFn
,InitializeDoFn
,JsonToRow.JsonToRowWithErrFn.ParseWithError
,KafkaReadSchemaTransformProvider.ErrorFn
,KafkaSourceConsumerFn
,PostProcessingMetricsDoFn
,PreparePubsubWriteDoFn
,PubsubIO.Write.PubsubBoundedWriter
,PubsubLiteReadSchemaTransformProvider.ErrorFn
,PubsubLiteSink
,PubsubLiteWriteSchemaTransformProvider.ErrorCounterFn
,PubsubLiteWriteSchemaTransformProvider.SetUuidFromPubSubMessage.SetUuidFn
,PubsubWriteSchemaTransformProvider.ErrorFn
,RampupThrottlingFn
,ReadAllViaFileBasedSourceTransform.AbstractReadFileRangesFn
,ReadAllViaFileBasedSourceTransform.SplitIntoRangesFn
,ReadChangeStreamPartitionDoFn
,ReadChangeStreamPartitionDoFn
,ReadFromPulsarDoFn
,ReadSpannerSchema
,RecordToPublishResultDoFn
,Reshuffle.AssignShardFn
,SnowflakeIO.Read.CleanTmpFilesFromGcsFn
,SnowflakeIO.Read.MapCsvToStringArrayFn
,SpannerChangestreamsReadSchemaTransformProvider.DataChangeRecordToRow
,SpannerReadSchemaTransformProvider.ErrorFn
,StorageApiConvertMessages.ConvertMessagesDoFn
,StorageApiFlushAndFinalizeDoFn
,TFRecordReadSchemaTransformProvider.ErrorFn
,TFRecordWriteSchemaTransformProvider.ErrorFn
,UnboundedSolaceWriter
,UpdateSchemaDestination
,ValueWithRecordId.StripIdsDoFn
,View.ToListViewDoFn
,Watch.WatchGrowthFn
,WriteToPulsarDoFn
ParDo
providing the code to use to process elements of the input PCollection
.
See ParDo
for more explanation, examples of use, and discussion of constraints on
DoFn
s, including their serializability, lack of access to global shared mutable state,
requirements for failure tolerance, and benefits of optimization.
DoFns
can be tested by using TestPipeline
. You can verify their
functional correctness in a local test using the DirectRunner
as well as running
integration tests with your production runner of choice. Typically, you can generate the input
data using Create.of(java.lang.Iterable<T>)
or other transforms. However, if you need to test the behavior of
DoFn.StartBundle
and DoFn.FinishBundle
with particular bundle boundaries, you can use
TestStream
.
Implementations must define a method annotated with DoFn.ProcessElement
that satisfies the
requirements described there. See the DoFn.ProcessElement
for details.
Example usage:
PCollection<String> lines = ... ;
PCollection<String> words =
lines.apply(ParDo.of(new DoFn<String, String>()) {
@ProcessElement
public void processElement(@Element String element, BoundedWindow window) {
...
}}));
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic @interface
Annotation for declaring that a state parameter is always fetched.static @interface
Annotation on a splittableDoFn
specifying that theDoFn
performs a bounded amount of work per input element, so applying it to a boundedPCollection
will produce also a boundedPCollection
.static interface
A parameter that is accessible during@StartBundle
,@ProcessElement
and@FinishBundle
that allows the caller to register a callback that will be invoked after the bundle has been successfully completed and the runner has commit the output.static @interface
Parameter annotation for the input element forDoFn.ProcessElement
,DoFn.GetInitialRestriction
,DoFn.GetSize
,DoFn.SplitRestriction
,DoFn.GetInitialWatermarkEstimatorState
,DoFn.NewWatermarkEstimator
, andDoFn.NewTracker
methods.static @interface
Annotation for specifying specific fields that are accessed in a Schema PCollection.static @interface
Annotation for the method to use to finish processing a batch of elements.class
Information accessible while within theDoFn.FinishBundle
method.static @interface
Annotation for the method that maps an element to an initial restriction for a splittableDoFn
.static @interface
Annotation for the method that maps an element and restriction to initial watermark estimator state for a splittableDoFn
.static @interface
Annotation for the method that returns the coder to use for the restriction of a splittableDoFn
.static @interface
Annotation for the method that returns the corresponding size for an element and restriction pair.static @interface
Annotation for the method that returns the coder to use for the watermark estimator state of a splittableDoFn
.static @interface
Parameter annotation for dereferencing input element key inKV
pair.static interface
Receives tagged output for a multi-output function.static @interface
Annotation for the method that creates a newRestrictionTracker
for the restriction of a splittableDoFn
.static @interface
Annotation for the method that creates a newWatermarkEstimator
for the watermark state of a splittableDoFn
.static @interface
Annotation for registering a callback for a timer.class
Information accessible when running aDoFn.OnTimer
method.static @interface
Annotation for registering a callback for a timerFamily.static @interface
Annotation for the method to use for performing actions on window expiration.class
static interface
Receives values of the given type.class
Information accessible when running aDoFn.ProcessElement
method.static class
When used as a return value ofDoFn.ProcessElement
, indicates whether there is more work to be done for the current element.static @interface
Annotation for the method to use for processing elements.static @interface
Annotation that may be added to aDoFn.ProcessElement
,DoFn.OnTimer
, orDoFn.OnWindowExpiration
method to indicate that the runner must ensure that the observable contents of the inputPCollection
or mutable state must be stable upon retries.static @interface
Annotation that may be added to aDoFn.ProcessElement
method to indicate that the runner must ensure that the observable contents of the inputPCollection
is sorted by time, in ascending order.static @interface
Parameter annotation for the restriction forDoFn.GetSize
,DoFn.SplitRestriction
,DoFn.GetInitialWatermarkEstimatorState
,DoFn.NewWatermarkEstimator
, andDoFn.NewTracker
methods.static @interface
Annotation for the method to use to prepare an instance for processing bundles of elements.static @interface
Parameter annotation for the SideInput for aDoFn.ProcessElement
method.static @interface
Annotation for the method that splits restriction of a splittableDoFn
into multiple parts to be processed in parallel.static @interface
Annotation for the method to use to prepare an instance for processing a batch of elements.class
Information accessible while within theDoFn.StartBundle
method.static @interface
Annotation for declaring and dereferencing state cells.static @interface
Annotation for the method to use to clean up this instance before it is discarded.static @interface
Parameter annotation for the TimerMap for aDoFn.ProcessElement
method.static @interface
Annotation for declaring and dereferencing timers.static @interface
Parameter annotation for the input element timestamp forDoFn.ProcessElement
,DoFn.GetInitialRestriction
,DoFn.GetSize
,DoFn.SplitRestriction
,DoFn.GetInitialWatermarkEstimatorState
,DoFn.NewWatermarkEstimator
, andDoFn.NewTracker
methods.static @interface
Annotation for the method that truncates the restriction of a splittableDoFn
into a bounded one.static @interface
Annotation on a splittableDoFn
specifying that theDoFn
performs an unbounded amount of work per input element, so applying it to a boundedPCollection
will produce an unboundedPCollection
.static @interface
Parameter annotation for the watermark estimator state for theDoFn.NewWatermarkEstimator
method.class
Information accessible to all methods in thisDoFn
where the context is in some window. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionDeprecated.Returns aTypeDescriptor
capturing what is known statically about the input type of thisDoFn
instance's most-derived class.Returns aTypeDescriptor
capturing what is known statically about the output type of thisDoFn
instance's most-derived class.void
populateDisplayData
(DisplayData.Builder builder) Register display data for the given transform or component.final void
Deprecated.useDoFn.Setup
orDoFn.StartBundle
instead.
-
Constructor Details
-
DoFn
public DoFn()
-
-
Method Details
-
getAllowedTimestampSkew
Deprecated.This method permits aDoFn
to emit elements behind the watermark. These elements are considered late, and if behind theallowed lateness
of a downstreamPCollection
may be silently dropped. See https://github.com/apache/beam/issues/18065 for details on a replacement.Returns the allowed timestamp skew duration, which is the maximum duration that timestamps can be shifted backward inDoFn.WindowedContext.outputWithTimestamp(OutputT, org.joda.time.Instant)
.The default value is
Duration.ZERO
, in which case timestamps can only be shifted forward to future. For infinite skew, returnDuration.millis(Long.MAX_VALUE)
. -
getInputTypeDescriptor
Returns aTypeDescriptor
capturing what is known statically about the input type of thisDoFn
instance's most-derived class.See
getOutputTypeDescriptor()
for more discussion. -
getOutputTypeDescriptor
Returns aTypeDescriptor
capturing what is known statically about the output type of thisDoFn
instance's most-derived class.In the normal case of a concrete
DoFn
subclass with no generic type parameters of its own (including anonymous inner classes), this will be a complete non-generic type, which is good for choosing a default outputCoder<O>
for the outputPCollection<O>
. -
prepareForProcessing
Deprecated.useDoFn.Setup
orDoFn.StartBundle
instead. This method will be removed in a future release.Finalize theDoFn
construction to prepare for processing. This method should be called by runners before any processing methods. -
populateDisplayData
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Parameters:
builder
- The builder to populate with display data.- See Also:
-
DoFn
to emit elements behind the watermark.