public class ApproximateQuantiles
extends java.lang.Object
PTransforms for getting an idea of a PCollection's
data distribution using approximate N-tiles (e.g. quartiles,
percentiles, etc.), either globally or per-key.| Modifier and Type | Class and Description |
|---|---|
static class |
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
The
ApproximateQuantilesCombineFn combiner gives an idea
of the distribution of a collection of values using approximate
N-tiles. |
| Modifier and Type | Method and Description |
|---|---|
static <T extends java.lang.Comparable<T>> |
globally(int numQuantiles)
Like
globally(int, Comparator), but sorts using the
elements' natural ordering. |
static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
globally(int numQuantiles,
ComparatorT compareFn)
Returns a
PTransform that takes a PCollection<T>
and returns a PCollection<List<T>> whose single value is a
List of the approximate N-tiles of the elements
of the input PCollection. |
static <K,V extends java.lang.Comparable<V>> |
perKey(int numQuantiles)
Like
perKey(int, Comparator), but sorts
values using the their natural ordering. |
static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> |
perKey(int numQuantiles,
ComparatorT compareFn)
Returns a
PTransform that takes a
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to a List of the approximate
N-tiles of the values associated with that key in the
input PCollection. |
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles, ComparatorT compareFn)
PTransform that takes a PCollection<T>
and returns a PCollection<List<T>> whose single value is a
List of the approximate N-tiles of the elements
of the input PCollection. This gives an idea of the
distribution of the input elements.
The computed List is of size numQuantiles,
and contains the input elements' minimum value,
numQuantiles-2 intermediate values, and maximum value, in
sorted order, using the given Comparator to order values.
To compute traditional N-tiles, one should use
ApproximateQuantiles.globally(N+1, compareFn).
If there are fewer input elements than numQuantiles,
then the result List will contain all the input elements,
in sorted order.
The argument Comparator must be Serializable.
Example of use:
PCollection<String> pc = ...;
PCollection<List<String>> quantiles =
pc.apply(ApproximateQuantiles.globally(11, stringCompareFn));
T - the type of the elements in the input PCollectionnumQuantiles - the number of elements in the resulting
quantile values ListcompareFn - the function to use to order the elementspublic static <T extends java.lang.Comparable<T>> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles)
globally(int, Comparator), but sorts using the
elements' natural ordering.T - the type of the elements in the input PCollectionnumQuantiles - the number of elements in the resulting
quantile values Listpublic static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles, ComparatorT compareFn)
PTransform that takes a
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to a List of the approximate
N-tiles of the values associated with that key in the
input PCollection. This gives an idea of the
distribution of the input values for each key.
Each of the computed Lists is of size numQuantiles,
and contains the input values' minimum value,
numQuantiles-2 intermediate values, and maximum value, in
sorted order, using the given Comparator to order values.
To compute traditional N-tiles, one should use
ApproximateQuantiles.perKey(compareFn, N+1).
If a key has fewer than numQuantiles values
associated with it, then that key's output List will
contain all the key's input values, in sorted order.
The argument Comparator must be Serializable.
Example of use:
PCollection<KV<Integer, String>> pc = ...;
PCollection<KV<Integer, List<String>>> quantilesPerKey =
pc.apply(ApproximateQuantiles.<Integer, String>perKey(stringCompareFn, 11));
See Combine.PerKey for how this affects timestamps and windowing.
K - the type of the keys in the input and output
PCollectionsV - the type of the values in the input PCollectionnumQuantiles - the number of elements in the resulting
quantile values ListcompareFn - the function to use to order the elementspublic static <K,V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles)
perKey(int, Comparator), but sorts
values using the their natural ordering.K - the type of the keys in the input and output
PCollectionsV - the type of the values in the input PCollectionnumQuantiles - the number of elements in the resulting
quantile values List