public class GroupIntoBatches<K,InputT> extends PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,java.lang.Iterable<InputT>>>>
PTransform
that batches inputs to a desired batch size. Batches will contain only
elements of a single key.
Elements are buffered until there are batchSize
elements, at which point they are
emitted to the output PCollection
. A maxBufferingDuration
can be set to emit
output early and avoid waiting for a full batch forever.
Windows are preserved (batches contain elements from the same window). Batches may contain elements from more than one bundle.
Example 1 (batch call a webservice and get return codes):
PCollection<KV<String, String>> input = ...; long batchSize = 100L; PCollection<KV<String, Iterable<String>>> batched = input .apply(GroupIntoBatches.<String, String>ofSize(batchSize)) .setCoder(KvCoder.of(StringUtf8Coder.of(), IterableCoder.of(StringUtf8Coder.of()))) .apply(ParDo.of(new DoFn<KV<String, Iterable<String>>, KV<String, String>>()
{@ProcessElement public void processElement(@Element KV<String, Iterable<String>> element, OutputReceiver<KV<String, String>> r) { r.output(KV.of(element.getKey(), callWebService(element.getValue()))); }
}));
Example 2 (batch unbounded input in a global window):
PCollection<KV<String, String>> unboundedInput = ...;
long batchSize = 100L;
Duration maxBufferingDuration = Duration.standardSeconds(10);
PCollection<KV<String, Iterable<String>>> batched = unboundedInput
.apply(Window.<KV<String, String>>into(new GlobalWindows())
.triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
.discardingFiredPanes())
.apply(GroupIntoBatches.<String, String>ofSize(batchSize)
.withMaxBufferingDuration(maxBufferingDuration));
Modifier and Type | Class and Description |
---|---|
class |
GroupIntoBatches.WithShardedKey |
name
Modifier and Type | Method and Description |
---|---|
PCollection<KV<K,java.lang.Iterable<InputT>>> |
expand(PCollection<KV<K,InputT>> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
long |
getBatchSize()
Returns the size of the batch.
|
static <K,InputT> GroupIntoBatches<K,InputT> |
ofSize(long batchSize) |
GroupIntoBatches<K,InputT> |
withMaxBufferingDuration(Duration duration)
Sets a time limit (in processing time) on how long an incomplete batch of elements is allowed
to be buffered.
|
GroupIntoBatches.WithShardedKey |
withShardedKey()
Outputs batched elements associated with sharded input keys.
|
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
public static <K,InputT> GroupIntoBatches<K,InputT> ofSize(long batchSize)
public long getBatchSize()
public GroupIntoBatches<K,InputT> withMaxBufferingDuration(Duration duration)
@Experimental public GroupIntoBatches.WithShardedKey withShardedKey()
public PCollection<KV<K,java.lang.Iterable<InputT>>> expand(PCollection<KV<K,InputT>> input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,java.lang.Iterable<InputT>>>>