public class GroupIntoBatches<K,InputT> extends PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,java.lang.Iterable<InputT>>>>
PTransform
that batches inputs to a desired batch size. Batches will contain only
elements of a single key.
Elements are buffered until there are batchSize
elements
buffered, at which point they are output to the output PCollection
.
Windows are preserved (batches contain elements from the same window). Batches may contain elements from more than one bundle
Example (batch call a webservice and get return codes)
Pipeline pipeline = Pipeline.create(...);
... // KV collection
long batchSize = 100L;
pipeline.apply(GroupIntoBatches.<String, String>ofSize(batchSize))
.setCoder(KvCoder.of(StringUtf8Coder.of(), IterableCoder.of(StringUtf8Coder.of())))
.apply(ParDo.of(new DoFn<KV<String, Iterable<String>>, KV<String, String>>() {
{@literal @}ProcessElement
public void processElement(ProcessContext c) {
c.output(KV.of(c.element().getKey(), callWebService(c.element().getValue())));
}
}));
pipeline.run();
name
Modifier and Type | Method and Description |
---|---|
PCollection<KV<K,java.lang.Iterable<InputT>>> |
expand(PCollection<KV<K,InputT>> input)
Override this method to specify how this
PTransform should be expanded
on the given InputT . |
static <K,InputT> GroupIntoBatches<K,InputT> |
ofSize(long batchSize) |
getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
public static <K,InputT> GroupIntoBatches<K,InputT> ofSize(long batchSize)
public PCollection<KV<K,java.lang.Iterable<InputT>>> expand(PCollection<KV<K,InputT>> input)
PTransform
PTransform
should be expanded
on the given InputT
.
NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,java.lang.Iterable<InputT>>>>