public class GroupIntoBatches<K,InputT> extends PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,java.lang.Iterable<InputT>>>>
PTransform that batches inputs to a desired batch size. Batches will contain only
elements of a single key.
Elements are buffered until there are batchSize elements buffered, at which point they
are output to the output PCollection.
Windows are preserved (batches contain elements from the same window). Batches may contain elements from more than one bundle.
Example (batch call a webservice and get return codes):
PCollection<KV<String, String>> input = ...; long batchSize = 100L; PCollection<KV<String, Iterable<String>>> batched = input .apply(GroupIntoBatches.<String, String>ofSize(batchSize)) .setCoder(KvCoder.of(StringUtf8Coder.of(), IterableCoder.of(StringUtf8Coder.of()))) .apply(ParDo.of(new DoFn<KV<String, Iterable<String>>, KV<String, String>>(){@ProcessElement public void processElement(@Element KV<String, Iterable<String>> element, OutputReceiver<KV<String, String>> r) { r.output(KV.of(element.getKey(), callWebService(element.getValue()))); }}));
name| Modifier and Type | Method and Description |
|---|---|
PCollection<KV<K,java.lang.Iterable<InputT>>> |
expand(PCollection<KV<K,InputT>> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT. |
long |
getBatchSize()
Returns the size of the batch.
|
static <K,InputT> GroupIntoBatches<K,InputT> |
ofSize(long batchSize) |
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validatepublic static <K,InputT> GroupIntoBatches<K,InputT> ofSize(long batchSize)
public long getBatchSize()
public PCollection<KV<K,java.lang.Iterable<InputT>>> expand(PCollection<KV<K,InputT>> input)
PTransformPTransform should be expanded on the given
InputT.
NOTE: This method should not be called directly. Instead apply the PTransform should
be applied to the InputT using the apply method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,java.lang.Iterable<InputT>>>>