InputT - the type of the (main) input elementsOutputT - the type of the (main) output elementspublic abstract class DoFn<InputT,OutputT> extends java.lang.Object implements java.io.Serializable, HasDisplayData
ParDo providing the code to use to process elements of the input PCollection.
 See ParDo for more explanation, examples of use, and discussion of constraints on
 DoFns, including their serializability, lack of access to global shared mutable state,
 requirements for failure tolerance, and benefits of optimization.
 
DoFns can be tested by using TestPipeline. You can verify their
 functional correctness in a local test using the DirectRunner as well as running
 integration tests with your production runner of choice. Typically, you can generate the input
 data using Create.of(java.lang.Iterable<T>) or other transforms. However, if you need to test the behavior of
 DoFn.StartBundle and DoFn.FinishBundle with particular bundle boundaries, you can use
 TestStream.
 
Implementations must define a method annotated with DoFn.ProcessElement that satisfies the
 requirements described there. See the DoFn.ProcessElement for details.
 
Example usage:
  PCollection<String> lines = ... ;
  PCollection<String> words =
      lines.apply(ParDo.of(new DoFn<String, String>()) {
          @ProcessElement
          public void processElement( @Element String element, BoundedWindow window) {
            ...
          }}));
 | Modifier and Type | Class and Description | 
|---|---|
| static interface  | DoFn.AlwaysFetchedAnnotation for declaring that a state parameter is always fetched. | 
| static interface  | DoFn.BoundedPerElementAnnotation on a splittable  DoFnspecifying that theDoFnperforms a bounded amount of work per input element, so
 applying it to a boundedPCollectionwill produce also a boundedPCollection. | 
| static interface  | DoFn.BundleFinalizerA parameter that is accessible during  @StartBundle,@ProcessElementand@FinishBundlethat allows the caller
 to register a callback that will be invoked after the bundle has been successfully completed
 and the runner has commit the output. | 
| static interface  | DoFn.ElementParameter annotation for the input element for  DoFn.ProcessElement,DoFn.GetInitialRestriction,DoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods. | 
| static interface  | DoFn.FieldAccessAnnotation for specifying specific fields that are accessed in a Schema PCollection. | 
| static interface  | DoFn.FinishBundleAnnotation for the method to use to finish processing a batch of elements. | 
| class  | DoFn.FinishBundleContextInformation accessible while within the  DoFn.FinishBundlemethod. | 
| static interface  | DoFn.GetInitialRestrictionAnnotation for the method that maps an element to an initial restriction for a splittable  DoFn. | 
| static interface  | DoFn.GetInitialWatermarkEstimatorStateAnnotation for the method that maps an element and restriction to initial watermark estimator
 state for a splittable  DoFn. | 
| static interface  | DoFn.GetRestrictionCoderAnnotation for the method that returns the coder to use for the restriction of a splittable  DoFn. | 
| static interface  | DoFn.GetSizeAnnotation for the method that returns the corresponding size for an element and restriction
 pair. | 
| static interface  | DoFn.GetWatermarkEstimatorStateCoderAnnotation for the method that returns the coder to use for the watermark estimator state of a
 splittable  DoFn. | 
| static interface  | DoFn.KeyParameter annotation for dereferencing input element key in  KVpair. | 
| static interface  | DoFn.MultiOutputReceiverReceives tagged output for a multi-output function. | 
| static interface  | DoFn.NewTrackerAnnotation for the method that creates a new  RestrictionTrackerfor the restriction of
 a splittableDoFn. | 
| static interface  | DoFn.NewWatermarkEstimatorAnnotation for the method that creates a new  WatermarkEstimatorfor the watermark state
 of a splittableDoFn. | 
| static interface  | DoFn.OnTimerAnnotation for registering a callback for a timer. | 
| class  | DoFn.OnTimerContextInformation accessible when running a  DoFn.OnTimermethod. | 
| static interface  | DoFn.OnTimerFamilyAnnotation for registering a callback for a timerFamily. | 
| static interface  | DoFn.OnWindowExpirationAnnotation for the method to use for performing actions on window expiration. | 
| class  | DoFn.OnWindowExpirationContext | 
| static interface  | DoFn.OutputReceiver<T>Receives values of the given type. | 
| class  | DoFn.ProcessContextInformation accessible when running a  DoFn.ProcessElementmethod. | 
| static class  | DoFn.ProcessContinuationWhen used as a return value of  DoFn.ProcessElement, indicates whether there is more work to
 be done for the current element. | 
| static interface  | DoFn.ProcessElementAnnotation for the method to use for processing elements. | 
| static interface  | DoFn.RequiresStableInputAnnotation that may be added to a  DoFn.ProcessElement,DoFn.OnTimer, orDoFn.OnWindowExpirationmethod to indicate that the runner must ensure that the observable contents
 of the inputPCollectionor mutable state must be stable upon retries. | 
| static interface  | DoFn.RequiresTimeSortedInputAnnotation that may be added to a  DoFn.ProcessElementmethod to indicate that the runner
 must ensure that the observable contents of the inputPCollectionis sorted by time, in
 ascending order. | 
| static interface  | DoFn.RestrictionParameter annotation for the restriction for  DoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods. | 
| static interface  | DoFn.SetupAnnotation for the method to use to prepare an instance for processing bundles of elements. | 
| static interface  | DoFn.SideInputParameter annotation for the SideInput for a  DoFn.ProcessElementmethod. | 
| static interface  | DoFn.SplitRestrictionAnnotation for the method that splits restriction of a splittable  DoFninto multiple parts to
 be processed in parallel. | 
| static interface  | DoFn.StartBundleAnnotation for the method to use to prepare an instance for processing a batch of elements. | 
| class  | DoFn.StartBundleContextInformation accessible while within the  DoFn.StartBundlemethod. | 
| static interface  | DoFn.StateIdAnnotation for declaring and dereferencing state cells. | 
| static interface  | DoFn.TeardownAnnotation for the method to use to clean up this instance before it is discarded. | 
| static interface  | DoFn.TimerFamilyParameter annotation for the TimerMap for a  DoFn.ProcessElementmethod. | 
| static interface  | DoFn.TimerIdAnnotation for declaring and dereferencing timers. | 
| static interface  | DoFn.TimestampParameter annotation for the input element timestamp for  DoFn.ProcessElement,DoFn.GetInitialRestriction,DoFn.GetSize,DoFn.SplitRestriction,DoFn.GetInitialWatermarkEstimatorState,DoFn.NewWatermarkEstimator, andDoFn.NewTrackermethods. | 
| static interface  | DoFn.TruncateRestrictionAnnotation for the method that truncates the restriction of a splittable  DoFninto a bounded one. | 
| static interface  | DoFn.UnboundedPerElementAnnotation on a splittable  DoFnspecifying that theDoFnperforms an unbounded amount of work per input element, so
 applying it to a boundedPCollectionwill produce an unboundedPCollection. | 
| static interface  | DoFn.WatermarkEstimatorStateParameter annotation for the watermark estimator state for the  DoFn.NewWatermarkEstimatormethod. | 
| class  | DoFn.WindowedContextInformation accessible to all methods in this  DoFnwhere the context is in some window. | 
| Constructor and Description | 
|---|
| DoFn() | 
| Modifier and Type | Method and Description | 
|---|---|
| Duration | getAllowedTimestampSkew()Deprecated. 
 This method permits a  DoFnto emit elements behind the watermark. These
     elements are considered late, and if behind theallowed latenessof a downstreamPCollectionmay be silently dropped. See
     https://github.com/apache/beam/issues/18065 for details on a replacement. | 
| TypeDescriptor<InputT> | getInputTypeDescriptor()Returns a  TypeDescriptorcapturing what is known statically about the input type of
 thisDoFninstance's most-derived class. | 
| TypeDescriptor<OutputT> | getOutputTypeDescriptor()Returns a  TypeDescriptorcapturing what is known statically about the output type of
 thisDoFninstance's most-derived class. | 
| void | populateDisplayData(DisplayData.Builder builder)Register display data for the given transform or component. | 
| void | prepareForProcessing()Deprecated. 
 use  DoFn.SetuporDoFn.StartBundleinstead. This method will be removed in a
     future release. | 
@Deprecated public Duration getAllowedTimestampSkew()
DoFn to emit elements behind the watermark. These
     elements are considered late, and if behind the allowed lateness of a downstream PCollection may be silently dropped. See
     https://github.com/apache/beam/issues/18065 for details on a replacement.DoFn.WindowedContext.outputWithTimestamp(OutputT, org.joda.time.Instant).
 The default value is Duration.ZERO, in which case timestamps can only be shifted
 forward to future. For infinite skew, return Duration.millis(Long.MAX_VALUE).
public TypeDescriptor<InputT> getInputTypeDescriptor()
TypeDescriptor capturing what is known statically about the input type of
 this DoFn instance's most-derived class.
 See getOutputTypeDescriptor() for more discussion.
public TypeDescriptor<OutputT> getOutputTypeDescriptor()
TypeDescriptor capturing what is known statically about the output type of
 this DoFn instance's most-derived class.
 In the normal case of a concrete DoFn subclass with no generic type parameters of
 its own (including anonymous inner classes), this will be a complete non-generic type, which is
 good for choosing a default output Coder<O> for the output PCollection<O>.
@Deprecated public final void prepareForProcessing()
DoFn.Setup or DoFn.StartBundle instead. This method will be removed in a
     future release.DoFn construction to prepare for processing. This method should be called
 by runners before any processing methods.public void populateDisplayData(DisplayData.Builder builder)
populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect
 display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace,
 but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace
 of the subcomponent.
 
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData in interface HasDisplayDatabuilder - The builder to populate with display data.HasDisplayData