T
- the type of the values being combinedpublic static class ApproximateUnique.ApproximateUniqueCombineFn<T> extends Combine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,java.lang.Long>
CombineFn
that computes an estimate of the number of
distinct values that were combined.
Hashes input elements, computes the top sampleSize
hash values, and uses those to extrapolate the size of the entire
set of hash values by assuming the rest of the hash values are as
densely distributed as the top sampleSize
.
Used to implement
ApproximatUnique.globally(...)
and
ApproximatUnique.perKey(...)
.
Modifier and Type | Class and Description |
---|---|
static class |
ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique
A heap utility class to efficiently track the largest added elements.
|
Constructor and Description |
---|
ApproximateUniqueCombineFn(long sampleSize,
Coder<T> coder) |
Modifier and Type | Method and Description |
---|---|
ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique |
addInput(ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique heap,
T input)
Adds the given input value to the given accumulator, returning the
new accumulator value.
|
ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique |
createAccumulator()
Returns a new, mutable accumulator value, representing the accumulation of zero input values.
|
java.lang.Long |
extractOutput(ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique heap)
Returns the output value that is the result of combining all
the input values represented by the given accumulator.
|
java.lang.reflect.TypeVariable<?> |
getAccumTVariable()
Returns the
TypeVariable of AccumT . |
Coder<ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique> |
getAccumulatorCoder(CoderRegistry registry,
Coder<T> inputCoder)
Returns the
Coder to use for accumulator AccumT
values, or null if it is not able to be inferred. |
Coder<OutputT> |
getDefaultOutputCoder(CoderRegistry registry,
Coder<InputT> inputCoder)
Returns the
Coder to use by default for output
OutputT values, or null if it is not able to be inferred. |
java.lang.String |
getIncompatibleGlobalWindowErrorMessage()
Returns the error message for not supported default values in Combine.globally().
|
java.lang.reflect.TypeVariable<?> |
getInputTVariable()
Returns the
TypeVariable of InputT . |
java.lang.reflect.TypeVariable<?> |
getOutputTVariable()
Returns the
TypeVariable of OutputT . |
ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique |
mergeAccumulators(java.lang.Iterable<ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique> heaps)
Returns an accumulator representing the accumulation of all the
input values accumulated in the merging accumulators.
|
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
apply, compact, defaultValue, getOutputType
public ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique createAccumulator()
Combine.CombineFn
createAccumulator
in class Combine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,java.lang.Long>
public ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique addInput(ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique heap, T input)
Combine.CombineFn
For efficiency, the input accumulator may be modified and returned.
addInput
in class Combine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,java.lang.Long>
public ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique mergeAccumulators(java.lang.Iterable<ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique> heaps)
Combine.CombineFn
May modify any of the argument accumulators. May return a fresh accumulator, or may return one of the (modified) argument accumulators.
mergeAccumulators
in class Combine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,java.lang.Long>
public java.lang.Long extractOutput(ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique heap)
Combine.CombineFn
extractOutput
in class Combine.CombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,java.lang.Long>
public Coder<ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique> getAccumulatorCoder(CoderRegistry registry, Coder<T> inputCoder)
CombineFnBase.GlobalCombineFn
Coder
to use for accumulator AccumT
values, or null if it is not able to be inferred.
By default, uses the knowledge of the Coder
being used
for InputT
values and the enclosing Pipeline
's
CoderRegistry
to try to infer the Coder for AccumT
values.
This is the Coder used to send data through a communication-intensive shuffle step, so a compact and efficient representation may have significant performance benefits.
getAccumulatorCoder
in interface CombineFnBase.GlobalCombineFn<T,ApproximateUnique.ApproximateUniqueCombineFn.LargestUnique,java.lang.Long>
public Coder<OutputT> getDefaultOutputCoder(CoderRegistry registry, Coder<InputT> inputCoder) throws CannotProvideCoderException
CombineFnBase.GlobalCombineFn
Coder
to use by default for output
OutputT
values, or null if it is not able to be inferred.
By default, uses the knowledge of the Coder
being
used for input InputT
values and the enclosing
Pipeline
's CoderRegistry
to try to infer the
Coder for OutputT
values.
getDefaultOutputCoder
in interface CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
CannotProvideCoderException
public java.lang.String getIncompatibleGlobalWindowErrorMessage()
CombineFnBase.GlobalCombineFn
getIncompatibleGlobalWindowErrorMessage
in interface CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
public java.lang.reflect.TypeVariable<?> getInputTVariable()
TypeVariable
of InputT
.public java.lang.reflect.TypeVariable<?> getAccumTVariable()
TypeVariable
of AccumT
.public java.lang.reflect.TypeVariable<?> getOutputTVariable()
TypeVariable
of OutputT
.public void populateDisplayData(DisplayData.Builder builder)
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call
super.populateDisplayData(builder)
in order to register display data in the current
namespace, but should otherwise use subcomponent.populateDisplayData(builder)
to use
the namespace of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
builder
- The builder to populate with display data.HasDisplayData