public class Watch
extends java.lang.Object
The output is returned as an unbounded PCollection
of KV<InputT, OutputT>
,
where each OutputT
is associated with the InputT
that produced it, and is
assigned with the timestamp that the poll function returned when this output was detected for the
first time.
Hypothetical usage example for watching new files in a collection of directories, where for each directory we assume that new files will not appear if the directory contains a file named ".complete":
PCollection<String> directories = ...; // E.g. Create.of(single directory)
PCollection<KV<String, String>> matches = filepatterns.apply(Watch.<String, String>growthOf(
new PollFn<String, String>() {
public PollResult<String> apply(TimestampedValue<String> input) {
String directory = input.getValue();
List<TimestampedValue<String>> outputs = new ArrayList<>();
... List the directory and get creation times of all files ...
boolean isComplete = ... does a file ".complete" exist in the directory ...
return isComplete ? PollResult.complete(outputs) : PollResult.incomplete(outputs);
}
})
// Poll each directory every 5 seconds
.withPollInterval(Duration.standardSeconds(5))
// Stop watching each directory 12 hours after it's seen even if it's incomplete
.withTerminationPerInput(afterTotalOf(Duration.standardHours(12)));
By default, the watermark for a particular input is computed from a poll result as "earliest
timestamp of new elements in this poll result". It can also be set explicitly via Watch.Growth.PollResult.withWatermark(org.joda.time.Instant)
if the Watch.Growth.PollFn
can provide a more optimistic
estimate.
Note: This transform works only in runners supporting Splittable DoFn: see capability matrix.
Modifier and Type | Class and Description |
---|---|
static class |
Watch.Growth<InputT,OutputT,KeyT>
|
Constructor and Description |
---|
Watch() |
Modifier and Type | Method and Description |
---|---|
static <InputT,OutputT,KeyT> |
growthOf(Contextful<Watch.Growth.PollFn<InputT,OutputT>> pollFn,
SerializableFunction<OutputT,KeyT> outputKeyFn)
Watches the growth of the given poll function, using the given "key function" to deduplicate
outputs.
|
static <InputT,OutputT> |
growthOf(Watch.Growth.PollFn<InputT,OutputT> pollFn)
Watches the growth of the given poll function.
|
static <InputT,OutputT> |
growthOf(Watch.Growth.PollFn<InputT,OutputT> pollFn,
Requirements requirements)
Watches the growth of the given poll function.
|
public static <InputT,OutputT> Watch.Growth<InputT,OutputT,OutputT> growthOf(Watch.Growth.PollFn<InputT,OutputT> pollFn, Requirements requirements)
public static <InputT,OutputT> Watch.Growth<InputT,OutputT,OutputT> growthOf(Watch.Growth.PollFn<InputT,OutputT> pollFn)
public static <InputT,OutputT,KeyT> Watch.Growth<InputT,OutputT,KeyT> growthOf(Contextful<Watch.Growth.PollFn<InputT,OutputT>> pollFn, SerializableFunction<OutputT,KeyT> outputKeyFn)
By default, this is the identity function, i.e. the output is used as its own key.