Class ApproximateQuantiles
PTransforms for getting an idea of a PCollection's data distribution using
approximate N-tiles (e.g. quartiles, percentiles, etc.), either globally or per-key.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends Comparator<T> & Serializable> TheApproximateQuantilesCombineFncombiner gives an idea of the distribution of a collection of values using approximateN-tiles. -
Method Summary
Modifier and TypeMethodDescriptionstatic <T extends Comparable<T>>
PTransform<PCollection<T>, PCollection<List<T>>> globally(int numQuantiles) Likeglobally(int, Comparator), but sorts using the elements' natural ordering.static <T,ComparatorT extends Comparator<T> & Serializable>
PTransform<PCollection<T>, PCollection<List<T>>> globally(int numQuantiles, ComparatorT compareFn) Returns aPTransformthat takes aPCollection<T>and returns aPCollection<List<T>>whose single value is aListof the approximateN-tiles of the elements of the inputPCollection.static <K,V extends Comparable<V>>
PTransform<PCollection<KV<K, V>>, PCollection<KV<K, List<V>>>> perKey(int numQuantiles) LikeperKey(int, Comparator), but sorts values using their natural ordering.static <K,V, ComparatorT extends Comparator<V> & Serializable>
PTransform<PCollection<KV<K, V>>, PCollection<KV<K, List<V>>>> perKey(int numQuantiles, ComparatorT compareFn) Returns aPTransformthat takes aPCollection<KV<K, V>>and returns aPCollection<KV<K, List<V>>>that contains an output element mapping each distinct key in the inputPCollectionto aListof the approximateN-tiles of the values associated with that key in the inputPCollection.
-
Method Details
-
globally
public static <T,ComparatorT extends Comparator<T> & Serializable> PTransform<PCollection<T>,PCollection<List<T>>> globally(int numQuantiles, ComparatorT compareFn) Returns aPTransformthat takes aPCollection<T>and returns aPCollection<List<T>>whose single value is aListof the approximateN-tiles of the elements of the inputPCollection. This gives an idea of the distribution of the input elements.The computed
Listis of sizenumQuantiles, and contains the input elements' minimum value,numQuantiles-2intermediate values, and maximum value, in sorted order, using the givenComparatorto order values. To compute traditionalN-tiles, one should useApproximateQuantiles.globally(N+1, compareFn).If there are fewer input elements than
numQuantiles, then the resultListwill contain all the input elements, in sorted order.The argument
Comparatormust beSerializable.Example of use:
PCollection<String> pc = ...; PCollection<List<String>> quantiles = pc.apply(ApproximateQuantiles.globally(11, stringCompareFn));- Type Parameters:
T- the type of the elements in the inputPCollection- Parameters:
numQuantiles- the number of elements in the resulting quantile valuesListcompareFn- the function to use to order the elements
-
globally
public static <T extends Comparable<T>> PTransform<PCollection<T>,PCollection<List<T>>> globally(int numQuantiles) Likeglobally(int, Comparator), but sorts using the elements' natural ordering.- Type Parameters:
T- the type of the elements in the inputPCollection- Parameters:
numQuantiles- the number of elements in the resulting quantile valuesList
-
perKey
public static <K,V, PTransform<PCollection<KV<K,ComparatorT extends Comparator<V> & Serializable> V>>, perKeyPCollection<KV<K, List<V>>>> (int numQuantiles, ComparatorT compareFn) Returns aPTransformthat takes aPCollection<KV<K, V>>and returns aPCollection<KV<K, List<V>>>that contains an output element mapping each distinct key in the inputPCollectionto aListof the approximateN-tiles of the values associated with that key in the inputPCollection. This gives an idea of the distribution of the input values for each key.Each of the computed
Lists is of sizenumQuantiles, and contains the input values' minimum value,numQuantiles-2intermediate values, and maximum value, in sorted order, using the givenComparatorto order values. To compute traditionalN-tiles, one should useApproximateQuantiles.perKey(compareFn, N+1).If a key has fewer than
numQuantilesvalues associated with it, then that key's outputListwill contain all the key's input values, in sorted order.The argument
Comparatormust beSerializable.Example of use:
PCollection<KV<Integer, String>> pc = ...; PCollection<KV<Integer, List<String>>> quantilesPerKey = pc.apply(ApproximateQuantiles.<Integer, String>perKey(stringCompareFn, 11));See
Combine.PerKeyfor how this affects timestamps and windowing.- Type Parameters:
K- the type of the keys in the input and outputPCollectionsV- the type of the values in the inputPCollection- Parameters:
numQuantiles- the number of elements in the resulting quantile valuesListcompareFn- the function to use to order the elements
-
perKey
public static <K,V extends Comparable<V>> PTransform<PCollection<KV<K,V>>, perKeyPCollection<KV<K, List<V>>>> (int numQuantiles) LikeperKey(int, Comparator), but sorts values using their natural ordering.- Type Parameters:
K- the type of the keys in the input and outputPCollectionsV- the type of the values in the inputPCollection- Parameters:
numQuantiles- the number of elements in the resulting quantile valuesList
-