Class Combine.PerKey<K,InputT,OutputT>
- Type Parameters:
K- the type of the keys of the input and outputPCollectionsInputT- the type of the values of the inputPCollectionOutputT- the type of the values of the outputPCollection
- All Implemented Interfaces:
Serializable,HasDisplayData
- Enclosing class:
Combine
PerKey<K, InputT, OutputT> takes a PCollection<KV<K, InputT>>, groups it by
key, applies a combining function to the InputT values associated with each key to
produce a combined OutputT value, and returns a PCollection<KV<K, OutputT>>
representing a map from each distinct key of the input PCollection to the corresponding
combined value. InputT and OutputT are often the same.
This is a concise shorthand for an application of GroupByKey followed by an
application of Combine.GroupedValues. See those operations for more
details on how keys are compared for equality and on the default Coder for the output.
Example of use:
PCollection<KV<String, Double>> salesRecords = ...;
PCollection<KV<String, Double>> totalSalesPerPerson =
salesRecords.apply(Combine.<String, Double, Double>perKey(
Sum.ofDoubles()));
Each output element is in the window by which its corresponding input was grouped, and has
the timestamp of the end of that window. The output PCollection has the same WindowFn as the input.
- See Also:
-
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints -
Method Summary
Modifier and TypeMethodDescriptionexpand(PCollection<KV<K, InputT>> input) Override this method to specify how thisPTransformshould be expanded on the givenInputT.Returns the side inputs of thisCombine, tagged with the tag of thePCollectionView.CombineFnBase.GlobalCombineFn<? super InputT, ?, OutputT> getFn()Returns theCombineFnBase.GlobalCombineFnused by this Combine operation.protected StringReturns the name to use by default for thisPTransform(not including the names of any enclosingPTransforms).List<PCollectionView<?>> Returns the side inputs used by this Combine operation.voidpopulateDisplayData(DisplayData.Builder builder) Register display data for the given transform or component.booleanReturns whether a runner should skip replacing this transform.withHotKeyFanout(int hotKeyFanout) LikewithHotKeyFanout(SerializableFunction), but returning the given constant value for every key.withHotKeyFanout(SerializableFunction<? super K, Integer> hotKeyFanout) If a single key has disproportionately many values, it may become a bottleneck, especially in streaming mode.withSideInputs(Iterable<? extends PCollectionView<?>> sideInputs) Returns aPTransformidentical to this, but with the specified side inputs to use inCombineWithContext.CombineFnWithContext.withSideInputs(PCollectionView<?>... sideInputs) Returns aPTransformidentical to this, but with the specified side inputs to use inCombineWithContext.CombineFnWithContext.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
-
Method Details
-
getKindString
Description copied from class:PTransformReturns the name to use by default for thisPTransform(not including the names of any enclosingPTransforms).By default, returns the base name of this
PTransform's class.The caller is responsible for ensuring that names of applied
PTransforms are unique, e.g., by adding a uniquifying suffix when needed.- Overrides:
getKindStringin classPTransform<PCollection<KV<K,InputT>>, PCollection<KV<K, OutputT>>>
-
withSideInputs
Returns aPTransformidentical to this, but with the specified side inputs to use inCombineWithContext.CombineFnWithContext. -
withSideInputs
public Combine.PerKey<K,InputT, withSideInputsOutputT> (Iterable<? extends PCollectionView<?>> sideInputs) Returns aPTransformidentical to this, but with the specified side inputs to use inCombineWithContext.CombineFnWithContext. -
withHotKeyFanout
public Combine.PerKeyWithHotKeyFanout<K,InputT, withHotKeyFanoutOutputT> (SerializableFunction<? super K, Integer> hotKeyFanout) If a single key has disproportionately many values, it may become a bottleneck, especially in streaming mode. This returns a new per-key combining transform that inserts an intermediate node to combine "hot" keys partially before performing the full combine.- Parameters:
hotKeyFanout- a function from keys to an integer N, where the key will be spread among N intermediate nodes for partial combining. If N is less than or equal to 1, this key will not be sent through an intermediate node.
-
withHotKeyFanout
LikewithHotKeyFanout(SerializableFunction), but returning the given constant value for every key. -
getFn
Returns theCombineFnBase.GlobalCombineFnused by this Combine operation. -
getSideInputs
Returns the side inputs used by this Combine operation. -
shouldSkipReplacement
public boolean shouldSkipReplacement()Returns whether a runner should skip replacing this transform. For runner use only -
getAdditionalInputs
Returns the side inputs of thisCombine, tagged with the tag of thePCollectionView. The values of the returned map will be equal to the result ofgetSideInputs().- Overrides:
getAdditionalInputsin classPTransform<PCollection<KV<K,InputT>>, PCollection<KV<K, OutputT>>>
-
expand
Description copied from class:PTransformOverride this method to specify how thisPTransformshould be expanded on the givenInputT.NOTE: This method should not be called directly. Instead apply the
PTransformshould be applied to theInputTusing theapplymethod.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expandin classPTransform<PCollection<KV<K,InputT>>, PCollection<KV<K, OutputT>>>
-
populateDisplayData
Description copied from class:PTransformRegister display data for the given transform or component.populateDisplayData(DisplayData.Builder)is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData). Implementations may callsuper.populateDisplayData(builder)in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayDatain interfaceHasDisplayData- Overrides:
populateDisplayDatain classPTransform<PCollection<KV<K,InputT>>, PCollection<KV<K, OutputT>>> - Parameters:
builder- The builder to populate with display data.- See Also:
-