T
- the type of the values being combinedpublic static class ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> extends Combine.AccumulatingCombineFn<T,org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>,java.util.List<T>>
ApproximateQuantilesCombineFn
combiner gives an idea
of the distribution of a collection of values using approximate
N
-tiles. The output of this combiner is a List
of size numQuantiles
, containing the input values'
minimum value, numQuantiles-2
intermediate values, and
maximum value, in sorted order, so for traditional
N
-tiles, one should use
ApproximateQuantilesCombineFn#create(N+1)
.
If there are fewer values to combine than
numQuantiles
, then the result List
will contain all the
values being combined, in sorted order.
Values are ordered using either a specified
Comparator
or the values' natural ordering.
To evaluate the quantiles we use the "New Algorithm" described here:
[MRL98] Manku, Rajagopalan & Lindsay, "Approximate Medians and other Quantiles in One Pass and with Limited Memory", Proc. 1998 ACM SIGMOD, Vol 27, No 2, p 426-435, June 1998. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1&type=pdf
The default error bound is 1 / N
, though in practice
the accuracy tends to be much better.
See create(int, Comparator, long, double)
for
more information about the meaning of epsilon
, and
withEpsilon(double)
for a convenient way to adjust it.
Combine.AccumulatingCombineFn.Accumulator<InputT,AccumT,OutputT>
Modifier and Type | Field and Description |
---|---|
static long |
DEFAULT_MAX_NUM_ELEMENTS
The cost (in time and space) to compute quantiles to a given
accuracy is a function of the total number of elements in the
data set.
|
Modifier and Type | Method and Description |
---|---|
static <T extends java.lang.Comparable<T>> |
create(int numQuantiles)
Like
create(int, Comparator) , but sorts values using their natural ordering. |
static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
create(int numQuantiles,
ComparatorT compareFn)
Returns an approximate quantiles combiner with the given
compareFn and desired number of quantiles. |
static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
create(int numQuantiles,
ComparatorT compareFn,
long maxNumElements,
double epsilon)
Creates an approximate quantiles combiner with the given
compareFn and desired number of quantiles. |
org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT> |
createAccumulator()
Returns a new, mutable accumulator value, representing the accumulation of zero input values.
|
java.lang.reflect.TypeVariable<?> |
getAccumTVariable()
Returns the
TypeVariable of AccumT . |
Coder<org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>> |
getAccumulatorCoder(CoderRegistry registry,
Coder<T> elementCoder)
Returns the
Coder to use for accumulator AccumT
values, or null if it is not able to be inferred. |
Coder<OutputT> |
getDefaultOutputCoder(CoderRegistry registry,
Coder<InputT> inputCoder)
Returns the
Coder to use by default for output
OutputT values, or null if it is not able to be inferred. |
java.lang.String |
getIncompatibleGlobalWindowErrorMessage()
Returns the error message for not supported default values in Combine.globally().
|
java.lang.reflect.TypeVariable<?> |
getInputTVariable()
Returns the
TypeVariable of InputT . |
java.lang.reflect.TypeVariable<?> |
getOutputTVariable()
Returns the
TypeVariable of OutputT . |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> |
withEpsilon(double epsilon)
Returns an
ApproximateQuantilesCombineFn that's like
this one except that it uses the specified epsilon
value. |
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> |
withMaxInputSize(long maxNumElements)
Returns an
ApproximateQuantilesCombineFn that's like
this one except that it uses the specified maxNumElements
value. |
addInput, extractOutput, mergeAccumulators
apply, compact, defaultValue, getOutputType
public static final long DEFAULT_MAX_NUM_ELEMENTS
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> create(int numQuantiles, ComparatorT compareFn)
compareFn
and desired number of quantiles. A total of
numQuantiles
elements will appear in the output list,
including the minimum and maximum.
The Comparator
must be Serializable
.
The default error bound is 1 / numQuantiles
, which
holds as long as the number of elements is less than
DEFAULT_MAX_NUM_ELEMENTS
.
public static <T extends java.lang.Comparable<T>> ApproximateQuantiles.ApproximateQuantilesCombineFn<T,Top.Largest<T>> create(int numQuantiles)
create(int, Comparator)
, but sorts values using their natural ordering.public ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> withEpsilon(double epsilon)
ApproximateQuantilesCombineFn
that's like
this one except that it uses the specified epsilon
value. Does not modify this combiner.
See create(int, Comparator, long,
double)
for more information about the meaning of
epsilon
.
public ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> withMaxInputSize(long maxNumElements)
ApproximateQuantilesCombineFn
that's like
this one except that it uses the specified maxNumElements
value. Does not modify this combiner.
See create(int, Comparator, long, double)
for more
information about the meaning of maxNumElements
.
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> create(int numQuantiles, ComparatorT compareFn, long maxNumElements, double epsilon)
compareFn
and desired number of quantiles. A total of
numQuantiles
elements will appear in the output list,
including the minimum and maximum.
The Comparator
must be Serializable
.
The default error bound is epsilon
, which holds as long
as the number of elements is less than maxNumElements
.
Specifically, if one considers the input as a sorted list x_1, ..., x_N,
then the distance between the each exact quantile x_c and its
approximation x_c' is bounded by |c - c'| < epsilon * N
.
Note that these errors are worst-case scenarios; in practice the accuracy
tends to be much better.
public org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT> createAccumulator()
Combine.CombineFn
createAccumulator
in class Combine.CombineFn<T,org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>,java.util.List<T>>
public Coder<org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>> getAccumulatorCoder(CoderRegistry registry, Coder<T> elementCoder)
CombineFnBase.GlobalCombineFn
Coder
to use for accumulator AccumT
values, or null if it is not able to be inferred.
By default, uses the knowledge of the Coder
being used
for InputT
values and the enclosing Pipeline
's
CoderRegistry
to try to infer the Coder for AccumT
values.
This is the Coder used to send data through a communication-intensive shuffle step, so a compact and efficient representation may have significant performance benefits.
getAccumulatorCoder
in interface CombineFnBase.GlobalCombineFn<T,org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>,java.util.List<T>>
public void populateDisplayData(DisplayData.Builder builder)
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call
super.populateDisplayData(builder)
in order to register display data in the current
namespace, but should otherwise use subcomponent.populateDisplayData(builder)
to use
the namespace of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
builder
- The builder to populate with display data.HasDisplayData
public Coder<OutputT> getDefaultOutputCoder(CoderRegistry registry, Coder<InputT> inputCoder) throws CannotProvideCoderException
CombineFnBase.GlobalCombineFn
Coder
to use by default for output
OutputT
values, or null if it is not able to be inferred.
By default, uses the knowledge of the Coder
being
used for input InputT
values and the enclosing
Pipeline
's CoderRegistry
to try to infer the
Coder for OutputT
values.
getDefaultOutputCoder
in interface CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
CannotProvideCoderException
public java.lang.String getIncompatibleGlobalWindowErrorMessage()
CombineFnBase.GlobalCombineFn
getIncompatibleGlobalWindowErrorMessage
in interface CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
public java.lang.reflect.TypeVariable<?> getInputTVariable()
TypeVariable
of InputT
.public java.lang.reflect.TypeVariable<?> getAccumTVariable()
TypeVariable
of AccumT
.public java.lang.reflect.TypeVariable<?> getOutputTVariable()
TypeVariable
of OutputT
.