K - the type of the keys of the input and output
PCollectionsInputT - the type of the values of the input PCollectionOutputT - the type of the values of the output PCollectionpublic static class Combine.PerKey<K,InputT,OutputT> extends PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
PerKey<K, InputT, OutputT> takes a
PCollection<KV<K, InputT>>, groups it by key, applies a
combining function to the InputT values associated with each
key to produce a combined OutputT value, and returns a
PCollection<KV<K, OutputT>> representing a map from each
distinct key of the input PCollection to the corresponding
combined value. InputT and OutputT are often the same.
This is a concise shorthand for an application of
GroupByKey followed by an application of
Combine.GroupedValues. See those
operations for more details on how keys are compared for equality
and on the default Coder for the output.
Example of use:
PCollection<KV<String, Double>> salesRecords = ...;
PCollection<KV<String, Double>> totalSalesPerPerson =
salesRecords.apply(Combine.<String, Double, Double>perKey(
Sum.ofDoubles()));
Each output element is in the window by which its corresponding input
was grouped, and has the timestamp of the end of that window. The output
PCollection has the same
WindowFn
as the input.
name| Modifier and Type | Method and Description |
|---|---|
PCollection<KV<K,OutputT>> |
expand(PCollection<KV<K,InputT>> input)
Override this method to specify how this
PTransform should be expanded
on the given InputT. |
java.util.Map<TupleTag<?>,PValue> |
getAdditionalInputs()
Returns the side inputs of this
Combine, tagged with the tag of the
PCollectionView. |
CombineFnBase.GlobalCombineFn<? super InputT,?,OutputT> |
getFn()
Returns the
CombineFnBase.GlobalCombineFn used by this Combine operation. |
protected java.lang.String |
getKindString()
Returns the name to use by default for this
PTransform
(not including the names of any enclosing PTransforms). |
java.util.List<PCollectionView<?>> |
getSideInputs()
Returns the side inputs used by this Combine operation.
|
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> |
withHotKeyFanout(int hotKeyFanout)
Like
withHotKeyFanout(SerializableFunction), but returning the given
constant value for every key. |
Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> |
withHotKeyFanout(SerializableFunction<? super K,java.lang.Integer> hotKeyFanout)
If a single key has disproportionately many values, it may become a
bottleneck, especially in streaming mode.
|
Combine.PerKey<K,InputT,OutputT> |
withSideInputs(java.lang.Iterable<? extends PCollectionView<?>> sideInputs)
Returns a
PTransform identical to this, but with the specified side inputs to use
in CombineWithContext.CombineFnWithContext. |
Combine.PerKey<K,InputT,OutputT> |
withSideInputs(PCollectionView<?>... sideInputs)
Returns a
PTransform identical to this, but with the specified side inputs to use
in CombineWithContext.CombineFnWithContext. |
getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getName, toString, validateprotected java.lang.String getKindString()
PTransformPTransform
(not including the names of any enclosing PTransforms).
By default, returns the base name of this PTransform's class.
The caller is responsible for ensuring that names of applied
PTransforms are unique, e.g., by adding a uniquifying
suffix when needed.
getKindString in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>public Combine.PerKey<K,InputT,OutputT> withSideInputs(PCollectionView<?>... sideInputs)
PTransform identical to this, but with the specified side inputs to use
in CombineWithContext.CombineFnWithContext.public Combine.PerKey<K,InputT,OutputT> withSideInputs(java.lang.Iterable<? extends PCollectionView<?>> sideInputs)
PTransform identical to this, but with the specified side inputs to use
in CombineWithContext.CombineFnWithContext.public Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> withHotKeyFanout(SerializableFunction<? super K,java.lang.Integer> hotKeyFanout)
hotKeyFanout - a function from keys to an integer N, where the key
will be spread among N intermediate nodes for partial combining.
If N is less than or equal to 1, this key will not be sent through an
intermediate node.public Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> withHotKeyFanout(int hotKeyFanout)
withHotKeyFanout(SerializableFunction), but returning the given
constant value for every key.public CombineFnBase.GlobalCombineFn<? super InputT,?,OutputT> getFn()
CombineFnBase.GlobalCombineFn used by this Combine operation.public java.util.List<PCollectionView<?>> getSideInputs()
public java.util.Map<TupleTag<?>,PValue> getAdditionalInputs()
Combine, tagged with the tag of the
PCollectionView. The values of the returned map will be equal to the result of
getSideInputs().getAdditionalInputs in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>public PCollection<KV<K,OutputT>> expand(PCollection<KV<K,InputT>> input)
PTransformPTransform should be expanded
on the given InputT.
NOTE: This method should not be called directly. Instead apply the
PTransform should be applied to the InputT using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>public void populateDisplayData(DisplayData.Builder builder)
PTransformpopulateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData). Implementations may call
super.populateDisplayData(builder) in order to register display data in the current
namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use
the namespace of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData in interface HasDisplayDatapopulateDisplayData in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>builder - The builder to populate with display data.HasDisplayData