public class GroupIntoBatches<K,InputT> extends PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,java.lang.Iterable<InputT>>>>
PTransform
that batches inputs to a desired batch size. Batches will contain only
elements of a single key.
Elements are buffered until there are batchSize
elements
buffered, at which point they are output to the output PCollection
.
Windows are preserved (batches contain elements from the same window). Batches may contain elements from more than one bundle
Example (batch call a webservice and get return codes)
Pipeline pipeline = Pipeline.create(...);
... // KV collection
long batchSize = 100L;
pipeline.apply(GroupIntoBatches.<String, String>ofSize(batchSize))
.setCoder(KvCoder.of(StringUtf8Coder.of(), IterableCoder.of(StringUtf8Coder.of())))
.apply(ParDo.of(new DoFn<KV<String, Iterable<String>>, KV<String, String>>() {
{@literal @}ProcessElement
public void processElement(ProcessContext c){
c.output(KV.of(c.element().getKey(), callWebService(c.element().getValue())));
}
}));
pipeline.run();
name
Modifier and Type | Method and Description |
---|---|
PCollection<KV<K,java.lang.Iterable<InputT>>> |
expand(PCollection<KV<K,InputT>> input)
Applies this
PTransform on the given InputT , and returns its
Output . |
static <K,InputT> GroupIntoBatches<K,InputT> |
ofSize(long batchSize) |
getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
public static <K,InputT> GroupIntoBatches<K,InputT> ofSize(long batchSize)
public PCollection<KV<K,java.lang.Iterable<InputT>>> expand(PCollection<KV<K,InputT>> input)
PTransform
PTransform
on the given InputT
, and returns its
Output
.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,java.lang.Iterable<InputT>>>>