public class ApproximateQuantiles
extends java.lang.Object
PTransform
s for getting an idea of a PCollection
's data distribution using
approximate N
-tiles (e.g. quartiles, percentiles, etc.), either globally or per-key.Modifier and Type | Class and Description |
---|---|
static class |
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
The
ApproximateQuantilesCombineFn combiner gives an idea of the distribution of a
collection of values using approximate N -tiles. |
Modifier and Type | Method and Description |
---|---|
static <T extends java.lang.Comparable<T>> |
globally(int numQuantiles)
Like
globally(int, Comparator) , but sorts using the elements' natural ordering. |
static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
globally(int numQuantiles,
ComparatorT compareFn)
Returns a
PTransform that takes a PCollection<T> and returns a PCollection<List<T>> whose single value is a List of the approximate N -tiles
of the elements of the input PCollection . |
static <K,V extends java.lang.Comparable<V>> |
perKey(int numQuantiles)
Like
perKey(int, Comparator) , but sorts values using their natural ordering. |
static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> |
perKey(int numQuantiles,
ComparatorT compareFn)
Returns a
PTransform that takes a PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the
input PCollection to a List of the approximate N -tiles of the values
associated with that key in the input PCollection . |
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles, ComparatorT compareFn)
PTransform
that takes a PCollection<T>
and returns a PCollection<List<T>>
whose single value is a List
of the approximate N
-tiles
of the elements of the input PCollection
. This gives an idea of the distribution of the
input elements.
The computed List
is of size numQuantiles
, and contains the input elements'
minimum value, numQuantiles-2
intermediate values, and maximum value, in sorted order,
using the given Comparator
to order values. To compute traditional N
-tiles, one
should use ApproximateQuantiles.globally(N+1, compareFn)
.
If there are fewer input elements than numQuantiles
, then the result List
will contain all the input elements, in sorted order.
The argument Comparator
must be Serializable
.
Example of use:
PCollection<String> pc = ...;
PCollection<List<String>> quantiles =
pc.apply(ApproximateQuantiles.globally(11, stringCompareFn));
T
- the type of the elements in the input PCollection
numQuantiles
- the number of elements in the resulting quantile values List
compareFn
- the function to use to order the elementspublic static <T extends java.lang.Comparable<T>> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles)
globally(int, Comparator)
, but sorts using the elements' natural ordering.T
- the type of the elements in the input PCollection
numQuantiles
- the number of elements in the resulting quantile values List
public static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles, ComparatorT compareFn)
PTransform
that takes a PCollection<KV<K, V>>
and returns a PCollection<KV<K, List<V>>>
that contains an output element mapping each distinct key in the
input PCollection
to a List
of the approximate N
-tiles of the values
associated with that key in the input PCollection
. This gives an idea of the
distribution of the input values for each key.
Each of the computed List
s is of size numQuantiles
, and contains the input
values' minimum value, numQuantiles-2
intermediate values, and maximum value, in sorted
order, using the given Comparator
to order values. To compute traditional N
-tiles, one should use ApproximateQuantiles.perKey(compareFn, N+1)
.
If a key has fewer than numQuantiles
values associated with it, then that key's
output List
will contain all the key's input values, in sorted order.
The argument Comparator
must be Serializable
.
Example of use:
PCollection<KV<Integer, String>> pc = ...;
PCollection<KV<Integer, List<String>>> quantilesPerKey =
pc.apply(ApproximateQuantiles.<Integer, String>perKey(stringCompareFn, 11));
See Combine.PerKey
for how this affects timestamps and windowing.
K
- the type of the keys in the input and output PCollection
sV
- the type of the values in the input PCollection
numQuantiles
- the number of elements in the resulting quantile values List
compareFn
- the function to use to order the elementspublic static <K,V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles)
perKey(int, Comparator)
, but sorts values using their natural ordering.K
- the type of the keys in the input and output PCollection
sV
- the type of the values in the input PCollection
numQuantiles
- the number of elements in the resulting quantile values List