K
- the type of the keys of the input and output PCollection
sInputT
- the type of the values of the input PCollection
OutputT
- the type of the values of the output PCollection
public static class Combine.PerKey<K,InputT,OutputT> extends PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
PerKey<K, InputT, OutputT>
takes a PCollection<KV<K, InputT>>
, groups it by
key, applies a combining function to the InputT
values associated with each key to
produce a combined OutputT
value, and returns a PCollection<KV<K, OutputT>>
representing a map from each distinct key of the input PCollection
to the corresponding
combined value. InputT
and OutputT
are often the same.
This is a concise shorthand for an application of GroupByKey
followed by an
application of Combine.GroupedValues
. See those operations for more
details on how keys are compared for equality and on the default Coder
for the output.
Example of use:
PCollection<KV<String, Double>> salesRecords = ...;
PCollection<KV<String, Double>> totalSalesPerPerson =
salesRecords.apply(Combine.<String, Double, Double>perKey(
Sum.ofDoubles()));
Each output element is in the window by which its corresponding input was grouped, and has
the timestamp of the end of that window. The output PCollection
has the same WindowFn
as the input.
name
Modifier and Type | Method and Description |
---|---|
PCollection<KV<K,OutputT>> |
expand(PCollection<KV<K,InputT>> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
java.util.Map<TupleTag<?>,PValue> |
getAdditionalInputs()
Returns the side inputs of this
Combine , tagged with the tag of the PCollectionView . |
CombineFnBase.GlobalCombineFn<? super InputT,?,OutputT> |
getFn()
Returns the
CombineFnBase.GlobalCombineFn used by this Combine operation. |
protected java.lang.String |
getKindString()
Returns the name to use by default for this
PTransform (not including the names of any
enclosing PTransform s). |
java.util.List<PCollectionView<?>> |
getSideInputs()
Returns the side inputs used by this Combine operation.
|
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> |
withHotKeyFanout(int hotKeyFanout)
Like
withHotKeyFanout(SerializableFunction) , but returning the given constant value
for every key. |
Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> |
withHotKeyFanout(SerializableFunction<? super K,java.lang.Integer> hotKeyFanout)
If a single key has disproportionately many values, it may become a bottleneck, especially in
streaming mode.
|
Combine.PerKey<K,InputT,OutputT> |
withSideInputs(java.lang.Iterable<? extends PCollectionView<?>> sideInputs)
Returns a
PTransform identical to this, but with the specified side inputs to use in
CombineWithContext.CombineFnWithContext . |
Combine.PerKey<K,InputT,OutputT> |
withSideInputs(PCollectionView<?>... sideInputs)
Returns a
PTransform identical to this, but with the specified side inputs to use in
CombineWithContext.CombineFnWithContext . |
getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getName, toString, validate
protected java.lang.String getKindString()
PTransform
PTransform
(not including the names of any
enclosing PTransform
s).
By default, returns the base name of this PTransform
's class.
The caller is responsible for ensuring that names of applied PTransform
s are unique,
e.g., by adding a uniquifying suffix when needed.
getKindString
in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
public Combine.PerKey<K,InputT,OutputT> withSideInputs(PCollectionView<?>... sideInputs)
PTransform
identical to this, but with the specified side inputs to use in
CombineWithContext.CombineFnWithContext
.public Combine.PerKey<K,InputT,OutputT> withSideInputs(java.lang.Iterable<? extends PCollectionView<?>> sideInputs)
PTransform
identical to this, but with the specified side inputs to use in
CombineWithContext.CombineFnWithContext
.public Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> withHotKeyFanout(SerializableFunction<? super K,java.lang.Integer> hotKeyFanout)
hotKeyFanout
- a function from keys to an integer N, where the key will be spread among
N intermediate nodes for partial combining. If N is less than or equal to 1, this key
will not be sent through an intermediate node.public Combine.PerKeyWithHotKeyFanout<K,InputT,OutputT> withHotKeyFanout(int hotKeyFanout)
withHotKeyFanout(SerializableFunction)
, but returning the given constant value
for every key.public CombineFnBase.GlobalCombineFn<? super InputT,?,OutputT> getFn()
CombineFnBase.GlobalCombineFn
used by this Combine operation.public java.util.List<PCollectionView<?>> getSideInputs()
public java.util.Map<TupleTag<?>,PValue> getAdditionalInputs()
Combine
, tagged with the tag of the PCollectionView
. The values of the returned map will be equal to the result of getSideInputs()
.getAdditionalInputs
in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
public PCollection<KV<K,OutputT>> expand(PCollection<KV<K,InputT>> input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
public void populateDisplayData(DisplayData.Builder builder)
PTransform
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call super.populateDisplayData(builder)
in order to register display data in the current namespace,
but should otherwise use subcomponent.populateDisplayData(builder)
to use the namespace
of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
populateDisplayData
in class PTransform<PCollection<KV<K,InputT>>,PCollection<KV<K,OutputT>>>
builder
- The builder to populate with display data.HasDisplayData