K
- type of input and output keysInputT
- type of input valuesOutputT
- type of output valuespublic static class Combine.GroupedValues<K,InputT,OutputT> extends PTransform<PCollection<? extends KV<K,? extends java.lang.Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
GroupedValues<K, InputT, OutputT>
takes a PCollection<KV<K, Iterable<InputT>>>
,
such as the result of GroupByKey
, applies a specified CombineFn<InputT, AccumT, OutputT>
to each of the input KV<K, Iterable<InputT>>
elements to produce a combined output KV<K, OutputT>
element, and returns a PCollection<KV<K, OutputT>>
containing all the combined output elements. It is common for
InputT == OutputT
, but not required. Common combining functions include sums, mins,
maxes, and averages of numbers, conjunctions and disjunctions of booleans, statistical
aggregations, etc.
Example of use:
PCollection<KV<String, Integer>> pc = ...;
PCollection<KV<String, Iterable<Integer>>> groupedByKey = pc.apply(
new GroupByKey<String, Integer>());
PCollection<KV<String, Integer>> sumByKey = groupedByKey.apply(
Combine.<String, Integer>groupedValues(
new Sum.SumIntegerFn()));
See also Combine.perKey(org.apache.beam.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
/Combine.PerKey
, which captures the common pattern of
"combining by key" in a single easy-to-use PTransform
.
Combining for different keys can happen in parallel. Moreover, combining of the Iterable<InputT>
values associated a single key can happen in parallel, with different subsets
of the values being combined separately, and their intermediate results combined further, in an
arbitrary tree reduction pattern, until a single result value is produced for each key.
By default, the Coder
of the keys of the output PCollection<KV<K, OutputT>>
is that of the keys of the input PCollection<KV<K, InputT>>
, and the Coder
of
the values of the output PCollection<KV<K, OutputT>>
is inferred from the concrete type
of the CombineFn<InputT, AccumT, OutputT>
's output type OutputT
.
Each output element has the same timestamp and is in the same window as its corresponding
input element, and the output PCollection
has the same WindowFn
associated with it as the input.
See also Combine.globally(org.apache.beam.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)
/Combine.Globally
, which combines all the values
in a PCollection
into a single value in a PCollection
.
name
Modifier and Type | Method and Description |
---|---|
PCollection<KV<K,OutputT>> |
expand(PCollection<? extends KV<K,? extends java.lang.Iterable<InputT>>> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
org.apache.beam.sdk.util.AppliedCombineFn<? super K,? super InputT,?,OutputT> |
getAppliedFn(CoderRegistry registry,
Coder<? extends KV<K,? extends java.lang.Iterable<InputT>>> inputCoder,
WindowingStrategy<?,?> windowingStrategy)
Returns the
Combine.CombineFn bound to its coders. |
CombineFnBase.GlobalCombineFn<? super InputT,?,OutputT> |
getFn()
Returns the
CombineFnBase.GlobalCombineFn used by this Combine operation. |
java.util.List<PCollectionView<?>> |
getSideInputs() |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
Combine.GroupedValues<K,InputT,OutputT> |
withSideInputs(java.lang.Iterable<? extends PCollectionView<?>> sideInputs) |
Combine.GroupedValues<K,InputT,OutputT> |
withSideInputs(PCollectionView<?>... sideInputs) |
compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validate
public Combine.GroupedValues<K,InputT,OutputT> withSideInputs(PCollectionView<?>... sideInputs)
public Combine.GroupedValues<K,InputT,OutputT> withSideInputs(java.lang.Iterable<? extends PCollectionView<?>> sideInputs)
public CombineFnBase.GlobalCombineFn<? super InputT,?,OutputT> getFn()
CombineFnBase.GlobalCombineFn
used by this Combine operation.public java.util.List<PCollectionView<?>> getSideInputs()
public PCollection<KV<K,OutputT>> expand(PCollection<? extends KV<K,? extends java.lang.Iterable<InputT>>> input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<? extends KV<K,? extends java.lang.Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
public org.apache.beam.sdk.util.AppliedCombineFn<? super K,? super InputT,?,OutputT> getAppliedFn(CoderRegistry registry, Coder<? extends KV<K,? extends java.lang.Iterable<InputT>>> inputCoder, WindowingStrategy<?,?> windowingStrategy)
Combine.CombineFn
bound to its coders.
For internal use.
public void populateDisplayData(DisplayData.Builder builder)
PTransform
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call super.populateDisplayData(builder)
in order to register display data in the current namespace,
but should otherwise use subcomponent.populateDisplayData(builder)
to use the namespace
of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
populateDisplayData
in class PTransform<PCollection<? extends KV<K,? extends java.lang.Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
builder
- The builder to populate with display data.HasDisplayData