public class ApproximateQuantiles
extends java.lang.Object
PTransforms for getting an idea of a PCollection's data distribution using
approximate N-tiles (e.g. quartiles, percentiles, etc.), either globally or per-key.| Modifier and Type | Class and Description |
|---|---|
static class |
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
The
ApproximateQuantilesCombineFn combiner gives an idea of the distribution of a
collection of values using approximate N-tiles. |
| Modifier and Type | Method and Description |
|---|---|
static <T extends java.lang.Comparable<T>> |
globally(int numQuantiles)
Like
globally(int, Comparator), but sorts using the elements' natural ordering. |
static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
globally(int numQuantiles,
ComparatorT compareFn)
Returns a
PTransform that takes a PCollection<T> and returns a PCollection<List<T>> whose single value is a List of the approximate N-tiles
of the elements of the input PCollection. |
static <K,V extends java.lang.Comparable<V>> |
perKey(int numQuantiles)
Like
perKey(int, Comparator), but sorts values using their natural ordering. |
static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> |
perKey(int numQuantiles,
ComparatorT compareFn)
Returns a
PTransform that takes a PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the
input PCollection to a List of the approximate N-tiles of the values
associated with that key in the input PCollection. |
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles, ComparatorT compareFn)
PTransform that takes a PCollection<T> and returns a PCollection<List<T>> whose single value is a List of the approximate N-tiles
of the elements of the input PCollection. This gives an idea of the distribution of the
input elements.
The computed List is of size numQuantiles, and contains the input elements'
minimum value, numQuantiles-2 intermediate values, and maximum value, in sorted order,
using the given Comparator to order values. To compute traditional N-tiles, one
should use ApproximateQuantiles.globally(N+1, compareFn).
If there are fewer input elements than numQuantiles, then the result List
will contain all the input elements, in sorted order.
The argument Comparator must be Serializable.
Example of use:
PCollection<String> pc = ...;
PCollection<List<String>> quantiles =
pc.apply(ApproximateQuantiles.globally(11, stringCompareFn));
T - the type of the elements in the input PCollectionnumQuantiles - the number of elements in the resulting quantile values ListcompareFn - the function to use to order the elementspublic static <T extends java.lang.Comparable<T>> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles)
globally(int, Comparator), but sorts using the elements' natural ordering.T - the type of the elements in the input PCollectionnumQuantiles - the number of elements in the resulting quantile values Listpublic static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles, ComparatorT compareFn)
PTransform that takes a PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the
input PCollection to a List of the approximate N-tiles of the values
associated with that key in the input PCollection. This gives an idea of the
distribution of the input values for each key.
Each of the computed Lists is of size numQuantiles, and contains the input
values' minimum value, numQuantiles-2 intermediate values, and maximum value, in sorted
order, using the given Comparator to order values. To compute traditional N-tiles, one should use ApproximateQuantiles.perKey(compareFn, N+1).
If a key has fewer than numQuantiles values associated with it, then that key's
output List will contain all the key's input values, in sorted order.
The argument Comparator must be Serializable.
Example of use:
PCollection<KV<Integer, String>> pc = ...;
PCollection<KV<Integer, List<String>>> quantilesPerKey =
pc.apply(ApproximateQuantiles.<Integer, String>perKey(stringCompareFn, 11));
See Combine.PerKey for how this affects timestamps and windowing.
K - the type of the keys in the input and output PCollectionsV - the type of the values in the input PCollectionnumQuantiles - the number of elements in the resulting quantile values ListcompareFn - the function to use to order the elementspublic static <K,V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles)
perKey(int, Comparator), but sorts values using their natural ordering.K - the type of the keys in the input and output PCollectionsV - the type of the values in the input PCollectionnumQuantiles - the number of elements in the resulting quantile values List