Class DoFn<InputT extends @Nullable Object,OutputT extends @Nullable Object>
- Type Parameters:
InputT- the type of the (main) input elementsOutputT- the type of the (main) output elements
- All Implemented Interfaces:
Serializable,HasDisplayData
- Direct Known Subclasses:
AddShardKeyDoFn,BatchStatefulParDoOverrides.BatchStatefulDoFn,BeamSetOperatorsTransforms.SetOperatorFilteringDoFn,BeamSqlOutputToConsoleFn,CleanUpReadChangeStreamDoFn,DataflowRunner.StreamingPCollectionViewWriterFn,DataGeneratorRowFn,DetectNewPartitionsDoFn,DetectNewPartitionsDoFn,ErrorHandler.PTransformErrorHandler.WriteErrorMetrics.CountErrors,FhirIO.Deidentify.DeidentifyFn,FhirIO.Export.ExportResourcesFn,FillGaps.FillGapsDoFn,FilterForMutationDoFn,HL7v2IO.HL7v2Read.FetchHL7v2Message.HL7v2MessageGetFn,HL7v2IO.Read.FetchHL7v2Message.HL7v2MessageGetFn,InitializeDoFn,InitializeDoFn,JsonToRow.JsonToRowWithErrFn.ParseWithError,KafkaReadSchemaTransformProvider.ErrorFn,KafkaSourceConsumerFn,NaiveReadFromPulsarDoFn,PostProcessingMetricsDoFn,PreparePubsubWriteDoFn,PubsubIO.Write.PubsubBoundedWriter,PubsubLiteReadSchemaTransformProvider.ErrorFn,PubsubLiteSink,PubsubLiteWriteSchemaTransformProvider.ErrorCounterFn,PubsubLiteWriteSchemaTransformProvider.SetUuidFromPubSubMessage.SetUuidFn,PubsubWriteSchemaTransformProvider.ErrorFn,RampupThrottlingFn,ReadAllViaFileBasedSourceTransform.AbstractReadFileRangesFn,ReadAllViaFileBasedSourceTransform.SplitIntoRangesFn,ReadChangeStreamPartitionDoFn,ReadChangeStreamPartitionDoFn,ReadSpannerSchema,RecordToPublishResultDoFn,Reshuffle.AssignShardFn,SnowflakeIO.Read.CleanTmpFilesFromGcsFn,SnowflakeIO.Read.MapCsvToStringArrayFn,SpannerChangestreamsReadSchemaTransformProvider.DataChangeRecordToRow,SpannerReadSchemaTransformProvider.ErrorFn,StorageApiConvertMessages.ConvertMessagesDoFn,StorageApiFlushAndFinalizeDoFn,TFRecordReadSchemaTransformProvider.ErrorFn,TFRecordWriteSchemaTransformProvider.ErrorFn,UnboundedSolaceWriter,UpdateSchemaDestination,ValueWithRecordId.StripIdsDoFn,View.ToListViewDoFn,Watch.WatchGrowthFn,WriteToPulsarDoFn
ParDo providing the code to use to process elements of the input PCollection.
See ParDo for more explanation, examples of use, and discussion of constraints on
DoFns, including their serializability, lack of access to global shared mutable state,
requirements for failure tolerance, and benefits of optimization.
DoFns can be tested by using TestPipeline. You can verify their
functional correctness in a local test using the DirectRunner as well as running
integration tests with your production runner of choice. Typically, you can generate the input
data using Create.of(java.lang.Iterable<T>) or other transforms. However, if you need to test the behavior of
DoFn.StartBundle and DoFn.FinishBundle with particular bundle boundaries, you can use
TestStream.
Implementations must define a method annotated with DoFn.ProcessElement that satisfies the
requirements described there. See the DoFn.ProcessElement for details.
Example usage:
PCollection<String> lines = ... ;
PCollection<String> words =
lines.apply(ParDo.of(new DoFn<String, String>()) {
@ProcessElement
public void processElement(@Element String element, BoundedWindow window) {
...
}}));
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic @interfaceAnnotation for declaring that a state parameter is always fetched.static @interfaceAnnotation on a splittableDoFnspecifying that theDoFnperforms a bounded amount of work per input element, so applying it to a boundedPCollectionwill produce also a boundedPCollection.static interfaceA parameter that is accessible during@StartBundle,@ProcessElementand@FinishBundlethat allows the caller to register a callback that will be invoked after the bundle has been successfully completed and the runner has commit the output.static @interfaceParameter annotation for the input element forDoFn.ProcessElement,DoFn.GetInitialRestriction,DoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods.static @interfaceAnnotation for specifying specific fields that are accessed in a Schema PCollection.static @interfaceAnnotation for the method to use to finish processing a batch of elements.classInformation accessible while within theDoFn.FinishBundlemethod.static @interfaceAnnotation for the method that maps an element to an initial restriction for a splittableDoFn.static @interfaceAnnotation for the method that maps an element and restriction to initial watermark estimator state for a splittableDoFn.static @interfaceAnnotation for the method that returns the coder to use for the restriction of a splittableDoFn.static @interfaceAnnotation for the method that returns the corresponding size for an element and restriction pair.static @interfaceAnnotation for the method that returns the coder to use for the watermark estimator state of a splittableDoFn.static @interfaceParameter annotation for dereferencing input element key inKVpair.static interfaceReceives tagged output for a multi-output function.static @interfaceAnnotation for the method that creates a newRestrictionTrackerfor the restriction of a splittableDoFn.static @interfaceAnnotation for the method that creates a newWatermarkEstimatorfor the watermark state of a splittableDoFn.static @interfaceAnnotation for registering a callback for a timer.classInformation accessible when running aDoFn.OnTimermethod.static @interfaceAnnotation for registering a callback for a timerFamily.static @interfaceAnnotation for the method to use for performing actions on window expiration.classstatic interfaceReceives values of the given type.classInformation accessible when running aDoFn.ProcessElementmethod.static classWhen used as a return value ofDoFn.ProcessElement, indicates whether there is more work to be done for the current element.static @interfaceAnnotation for the method to use for processing elements.static @interfaceAnnotation that may be added to aDoFn.ProcessElement,DoFn.OnTimer, orDoFn.OnWindowExpirationmethod to indicate that the runner must ensure that the observable contents of the inputPCollectionor mutable state must be stable upon retries.static @interfaceAnnotation that may be added to aDoFn.ProcessElementmethod to indicate that the runner must ensure that the observable contents of the inputPCollectionis sorted by time, in ascending order.static @interfaceParameter annotation for the restriction forDoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods.static @interfaceAnnotation for the method to use to prepare an instance for processing bundles of elements.static @interfaceParameter annotation for the SideInput for aDoFn.ProcessElementmethod.static @interfaceAnnotation for the method that splits restriction of a splittableDoFninto multiple parts to be processed in parallel.static @interfaceAnnotation for the method to use to prepare an instance for processing a batch of elements.classInformation accessible while within theDoFn.StartBundlemethod.static @interfaceAnnotation for declaring and dereferencing state cells.static @interfaceAnnotation for the method to use to clean up this instance before it is discarded.static @interfaceParameter annotation for the TimerMap for aDoFn.ProcessElementmethod.static @interfaceAnnotation for declaring and dereferencing timers.static @interfaceParameter annotation for the input element timestamp forDoFn.ProcessElement,DoFn.GetInitialRestriction,DoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods.static @interfaceAnnotation for the method that truncates the restriction of a splittableDoFninto a bounded one.static @interfaceAnnotation on a splittableDoFnspecifying that theDoFnperforms an unbounded amount of work per input element, so applying it to a boundedPCollectionwill produce an unboundedPCollection.static @interfaceParameter annotation for the watermark estimator state for theDoFn.NewWatermarkEstimatormethod.classInformation accessible to all methods in thisDoFnwhere the context is in some window. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionDeprecated.Returns aTypeDescriptorcapturing what is known statically about the input type of thisDoFninstance's most-derived class.Returns aTypeDescriptorcapturing what is known statically about the output type of thisDoFninstance's most-derived class.voidpopulateDisplayData(DisplayData.Builder builder) Register display data for the given transform or component.final voidDeprecated.useDoFn.SetuporDoFn.StartBundleinstead.
-
Constructor Details
-
DoFn
public DoFn()
-
-
Method Details
-
getAllowedTimestampSkew
Deprecated.This method permits aDoFnto emit elements behind the watermark. These elements are considered late, and if behind theallowed latenessof a downstreamPCollectionmay be silently dropped. See https://github.com/apache/beam/issues/18065 for details on a replacement.Returns the allowed timestamp skew duration, which is the maximum duration that timestamps can be shifted backward inDoFn.WindowedContext.outputWithTimestamp(OutputT, org.joda.time.Instant).The default value is
Duration.ZERO, in which case timestamps can only be shifted forward to future. For infinite skew, returnDuration.millis(Long.MAX_VALUE). -
getInputTypeDescriptor
Returns aTypeDescriptorcapturing what is known statically about the input type of thisDoFninstance's most-derived class.See
getOutputTypeDescriptor()for more discussion. -
getOutputTypeDescriptor
Returns aTypeDescriptorcapturing what is known statically about the output type of thisDoFninstance's most-derived class.In the normal case of a concrete
DoFnsubclass with no generic type parameters of its own (including anonymous inner classes), this will be a complete non-generic type, which is good for choosing a default outputCoder<O>for the outputPCollection<O>. -
prepareForProcessing
Deprecated.useDoFn.SetuporDoFn.StartBundleinstead. This method will be removed in a future release.Finalize theDoFnconstruction to prepare for processing. This method should be called by runners before any processing methods. -
populateDisplayData
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData). Implementations may callsuper.populateDisplayData(builder)in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayDatain interfaceHasDisplayData- Parameters:
builder- The builder to populate with display data.- See Also:
-
DoFnto emit elements behind the watermark.