Class BatchElements<T>
- Type Parameters:
T- the type of input elements
- All Implemented Interfaces:
Serializable,HasDisplayData
PTransform that batches elements for amortized processing.
This transform is designed to precede operations whose processing cost is of the form:
time = fixed_cost + num_elements * per_element_cost
When the per-element cost is significantly smaller than the fixed cost, batching multiple elements together can amortize that fixed cost and improve overall throughput.
The transform consumes a PCollection<T> and produces a PCollection<List<T>>,
where each output element is a batch of input elements.
This transform dynamically determines an optimal batch size between the configured minimum and
maximum values by profiling the execution time of downstream (fused) operations. To enforce a
fixed batch size, set minBatchSize == maxBatchSize.
Elements are batched per window. Each emitted batch belongs to the same window as its elements and is assigned a timestamp at the end of that window.
Example
// With default configuration
pipeline
.apply("Create", Create.of(range(200)))
.apply("Batch", BatchElements.withDefaults())
.apply(...);
// With custom configuration
BatchElements.BatchConfig config =
BatchElements.BatchConfig.builder()
.withMinBatchSize(1)
.withMaxBatchSize(15)
.withTargetBatchDurationSecs(10.0)
.withTargetBatchOverhead(0.05)
.withVariance(0.0)
.build();
pipeline
.apply("Create", Create.of(range(200)))
.apply("Batch", BatchElements.withConfig(config))
.apply(
"Sizes",
MapElements.via(
new SimpleFunction<List<Integer>, Integer>() {
@Override
public Integer apply(List<Integer> input) {
return input.size();
}
}));
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints -
Method Summary
Modifier and TypeMethodDescriptionPCollection<List<T>> expand(PCollection<T> input) Override this method to specify how thisPTransformshould be expanded on the givenInputT.static <T> BatchElements<T> withConfig(BatchElements.BatchConfig config) static <T> BatchElements<T> Batch Elements with default configuration.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Method Details
-
withDefaults
Batch Elements with default configuration. -
withConfig
-
expand
Description copied from class:PTransformOverride this method to specify how thisPTransformshould be expanded on the givenInputT.NOTE: This method should not be called directly. Instead apply the
PTransformshould be applied to theInputTusing theapplymethod.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expandin classPTransform<PCollection<T>,PCollection<List<T>>>
-