public class WithTimestamps<T> extends PTransform<PCollection<T>,PCollection<T>>
PTransform
for assigning timestamps to all the elements of a PCollection
.
Timestamps are used to assign Windows
to elements within the Window.into(org.apache.beam.sdk.transforms.windowing.WindowFn)
PTransform
. Assigning
timestamps is useful when the input data set comes from a Source
without implicit
timestamps (such as TextIO
).
name
Modifier and Type | Method and Description |
---|---|
PCollection<T> |
expand(PCollection<T> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
Duration |
getAllowedTimestampSkew()
Deprecated.
This method permits a to elements to be emitted behind the watermark. These
elements are considered late, and if behind the
allowed lateness of a downstream PCollection may be silently dropped. See
https://issues.apache.org/jira/browse/BEAM-644 for details on a replacement. |
static <T> WithTimestamps<T> |
of(SerializableFunction<T,Instant> fn)
For a
SerializableFunction fn from T to Instant , outputs a
PTransform that takes an input PCollection<T> and outputs a
PCollection<T> containing every element v in the input where
each element is output with a timestamp obtained as the result of fn.apply(v) . |
WithTimestamps<T> |
withAllowedTimestampSkew(Duration allowedTimestampSkew)
Deprecated.
This method permits a to elements to be emitted behind the watermark. These
elements are considered late, and if behind the
allowed lateness of a downstream PCollection may be silently dropped. See
https://issues.apache.org/jira/browse/BEAM-644 for details on a replacement. |
compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
public static <T> WithTimestamps<T> of(SerializableFunction<T,Instant> fn)
SerializableFunction
fn
from T
to Instant
, outputs a
PTransform
that takes an input PCollection<T>
and outputs a
PCollection<T>
containing every element v
in the input where
each element is output with a timestamp obtained as the result of fn.apply(v)
.
If the input PCollection
elements have timestamps, the output timestamp for each
element must not be before the input element's timestamp minus the value of getAllowedTimestampSkew()
. If an output timestamp is before this time, the transform will
throw an IllegalArgumentException
when executed. Use withAllowedTimestampSkew(Duration)
to update the allowed skew.
CAUTION: Use of withAllowedTimestampSkew(Duration)
permits elements to be emitted
behind the watermark. These elements are considered late, and if behind the allowed lateness
of a downstream PCollection
may
be silently dropped. See https://issues.apache.org/jira/browse/BEAM-644 for details on a
replacement.
Each output element will be in the same windows as the input element. If a new window based
on the new output timestamp is desired, apply a new instance of Window.into(WindowFn)
.
This transform will fail at execution time with a NullPointerException
if for any
input element the result of fn.apply(v)
is null
.
Example of use in Java 8:
PCollection<Record> timestampedRecords = records.apply(
WithTimestamps.of((Record rec) -> rec.getInstant());
@Deprecated public WithTimestamps<T> withAllowedTimestampSkew(Duration allowedTimestampSkew)
allowed lateness
of a downstream PCollection
may be silently dropped. See
https://issues.apache.org/jira/browse/BEAM-644 for details on a replacement.The default value is Duration.ZERO
, allowing timestamps to only be shifted into the
future. For infinite skew, use new Duration(Long.MAX_VALUE)
.
@Deprecated public Duration getAllowedTimestampSkew()
allowed lateness
of a downstream PCollection
may be silently dropped. See
https://issues.apache.org/jira/browse/BEAM-644 for details on a replacement.DoFn.getAllowedTimestampSkew()
public PCollection<T> expand(PCollection<T> input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<T>,PCollection<T>>