InputT - the type of the (main) input elementsOutputT - the type of the (main) output elementspublic abstract class DoFn<InputT,OutputT> extends java.lang.Object implements java.io.Serializable, HasDisplayData
ParDo providing the code to use to process elements of the input PCollection.
 See ParDo for more explanation, examples of use, and discussion of constraints on
 DoFns, including their serializability, lack of access to global shared mutable state,
 requirements for failure tolerance, and benefits of optimization.
 
DoFns can be tested by using TestPipeline. You can verify their
 functional correctness in a local test using the DirectRunner as well as running
 integration tests with your production runner of choice. Typically, you can generate the input
 data using Create.of(java.lang.Iterable<T>) or other transforms. However, if you need to test the behavior of
 DoFn.StartBundle and DoFn.FinishBundle with particular bundle boundaries, you can use
 TestStream.
 
Implementations must define a method annotated with DoFn.ProcessElement that satisfies the
 requirements described there. See the DoFn.ProcessElement for details.
 
Example usage:
  PCollection<String> lines = ... ;
  PCollection<String> words =
      lines.apply(ParDo.of(new DoFn<String, String>()) {
          @ProcessElement
          public void processElement( @Element String element, BoundedWindow window) {
            ...
          }}));
 | Modifier and Type | Class and Description | 
|---|---|
static interface  | 
DoFn.BoundedPerElement
Annotation on a splittable  
DoFn
 specifying that the DoFn performs a bounded amount of work per input element, so
 applying it to a bounded PCollection will produce also a bounded PCollection. | 
static interface  | 
DoFn.Element
Parameter annotation for the input element for a  
DoFn.ProcessElement method. | 
static interface  | 
DoFn.FieldAccess
Annotation for specifying specific fields that are accessed in a Schema PCollection. 
 | 
static interface  | 
DoFn.FinishBundle
Annotation for the method to use to finish processing a batch of elements. 
 | 
class  | 
DoFn.FinishBundleContext
Information accessible while within the  
DoFn.FinishBundle method. | 
static interface  | 
DoFn.GetInitialRestriction
Annotation for the method that maps an element to an initial restriction for a splittable  
DoFn. | 
static interface  | 
DoFn.GetRestrictionCoder
Annotation for the method that returns the coder to use for the restriction of a splittable  
DoFn. | 
static interface  | 
DoFn.MultiOutputReceiver
Receives tagged output for a multi-output function. 
 | 
static interface  | 
DoFn.NewTracker
Annotation for the method that creates a new  
RestrictionTracker for the restriction of
 a splittable DoFn. | 
static interface  | 
DoFn.OnTimer
Annotation for registering a callback for a timer. 
 | 
class  | 
DoFn.OnTimerContext
Information accessible when running a  
DoFn.OnTimer method. | 
static interface  | 
DoFn.OnWindowExpiration
Annotation for the method to use for performing actions on window expiration. 
 | 
static interface  | 
DoFn.OutputReceiver<T>
Receives values of the given type. 
 | 
class  | 
DoFn.ProcessContext
Information accessible when running a  
DoFn.ProcessElement method. | 
static class  | 
DoFn.ProcessContinuation
When used as a return value of  
DoFn.ProcessElement, indicates whether there is more work to
 be done for the current element. | 
static interface  | 
DoFn.ProcessElement
Annotation for the method to use for processing elements. 
 | 
static interface  | 
DoFn.RequiresStableInput
Experimental - no backwards compatibility guarantees. 
 | 
static interface  | 
DoFn.Setup
Annotation for the method to use to prepare an instance for processing bundles of elements. 
 | 
static interface  | 
DoFn.SplitRestriction
Annotation for the method that splits restriction of a splittable  
DoFn into multiple parts to
 be processed in parallel. | 
static interface  | 
DoFn.StartBundle
Annotation for the method to use to prepare an instance for processing a batch of elements. 
 | 
class  | 
DoFn.StartBundleContext
Information accessible while within the  
DoFn.StartBundle method. | 
static interface  | 
DoFn.StateId
Annotation for declaring and dereferencing state cells. 
 | 
static interface  | 
DoFn.Teardown
Annotation for the method to use to clean up this instance before it is discarded. 
 | 
static interface  | 
DoFn.TimerId
Annotation for declaring and dereferencing timers. 
 | 
static interface  | 
DoFn.Timestamp
Parameter annotation for the input element timestamp for a  
DoFn.ProcessElement method. | 
static interface  | 
DoFn.UnboundedPerElement
Annotation on a splittable  
DoFn
 specifying that the DoFn performs an unbounded amount of work per input element, so
 applying it to a bounded PCollection will produce an unbounded PCollection. | 
class  | 
DoFn.WindowedContext
Information accessible to all methods in this  
DoFn where the context is in some window. | 
| Constructor and Description | 
|---|
DoFn()  | 
| Modifier and Type | Method and Description | 
|---|---|
Duration | 
getAllowedTimestampSkew()
Deprecated. 
 
This method permits a  
DoFn to emit elements behind the watermark. These
     elements are considered late, and if behind the allowed lateness of a downstream PCollection may be silently dropped. See
     https://issues.apache.org/jira/browse/BEAM-644 for details on a replacement. | 
TypeDescriptor<InputT> | 
getInputTypeDescriptor()
Returns a  
TypeDescriptor capturing what is known statically about the input type of
 this DoFn instance's most-derived class. | 
TypeDescriptor<OutputT> | 
getOutputTypeDescriptor()
Returns a  
TypeDescriptor capturing what is known statically about the output type of
 this DoFn instance's most-derived class. | 
void | 
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component. 
 | 
void | 
prepareForProcessing()
Deprecated. 
 
use  
DoFn.Setup or DoFn.StartBundle instead. This method will be removed in a
     future release. | 
@Deprecated public Duration getAllowedTimestampSkew()
DoFn to emit elements behind the watermark. These
     elements are considered late, and if behind the allowed lateness of a downstream PCollection may be silently dropped. See
     https://issues.apache.org/jira/browse/BEAM-644 for details on a replacement.DoFn.WindowedContext.outputWithTimestamp(OutputT, org.joda.time.Instant).
 The default value is Duration.ZERO, in which case timestamps can only be shifted
 forward to future. For infinite skew, return Duration.millis(Long.MAX_VALUE).
public TypeDescriptor<InputT> getInputTypeDescriptor()
TypeDescriptor capturing what is known statically about the input type of
 this DoFn instance's most-derived class.
 See getOutputTypeDescriptor() for more discussion.
public TypeDescriptor<OutputT> getOutputTypeDescriptor()
TypeDescriptor capturing what is known statically about the output type of
 this DoFn instance's most-derived class.
 In the normal case of a concrete DoFn subclass with no generic type parameters of
 its own (including anonymous inner classes), this will be a complete non-generic type, which is
 good for choosing a default output Coder<O> for the output PCollection<O>.
@Deprecated public final void prepareForProcessing()
DoFn.Setup or DoFn.StartBundle instead. This method will be removed in a
     future release.DoFn construction to prepare for processing. This method should be called
 by runners before any processing methods.public void populateDisplayData(DisplayData.Builder builder)
populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect
 display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace,
 but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace
 of the subcomponent.
 
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData in interface HasDisplayDatabuilder - The builder to populate with display data.HasDisplayData