public class ApproximateQuantiles
extends java.lang.Object
PTransform
s for getting an idea of a PCollection
's
data distribution using approximate N
-tiles (e.g. quartiles,
percentiles, etc.), either globally or per-key.Modifier and Type | Class and Description |
---|---|
static class |
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
The
ApproximateQuantilesCombineFn combiner gives an idea
of the distribution of a collection of values using approximate
N -tiles. |
Modifier and Type | Method and Description |
---|---|
static <T extends java.lang.Comparable<T>> |
globally(int numQuantiles)
Like
globally(int, Comparator) , but sorts using the
elements' natural ordering. |
static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
globally(int numQuantiles,
ComparatorT compareFn)
Returns a
PTransform that takes a PCollection<T>
and returns a PCollection<List<T>> whose single value is a
List of the approximate N -tiles of the elements
of the input PCollection . |
static <K,V extends java.lang.Comparable<V>> |
perKey(int numQuantiles)
Like
perKey(int, Comparator) , but sorts
values using the their natural ordering. |
static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> |
perKey(int numQuantiles,
ComparatorT compareFn)
Returns a
PTransform that takes a
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to a List of the approximate
N -tiles of the values associated with that key in the
input PCollection . |
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles, ComparatorT compareFn)
PTransform
that takes a PCollection<T>
and returns a PCollection<List<T>>
whose single value is a
List
of the approximate N
-tiles of the elements
of the input PCollection
. This gives an idea of the
distribution of the input elements.
The computed List
is of size numQuantiles
,
and contains the input elements' minimum value,
numQuantiles-2
intermediate values, and maximum value, in
sorted order, using the given Comparator
to order values.
To compute traditional N
-tiles, one should use
ApproximateQuantiles.globally(N+1, compareFn)
.
If there are fewer input elements than numQuantiles
,
then the result List
will contain all the input elements,
in sorted order.
The argument Comparator
must be Serializable
.
Example of use:
PCollection<String> pc = ...;
PCollection<List<String>> quantiles =
pc.apply(ApproximateQuantiles.globally(11, stringCompareFn));
T
- the type of the elements in the input PCollection
numQuantiles
- the number of elements in the resulting
quantile values List
compareFn
- the function to use to order the elementspublic static <T extends java.lang.Comparable<T>> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles)
globally(int, Comparator)
, but sorts using the
elements' natural ordering.T
- the type of the elements in the input PCollection
numQuantiles
- the number of elements in the resulting
quantile values List
public static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles, ComparatorT compareFn)
PTransform
that takes a
PCollection<KV<K, V>>
and returns a
PCollection<KV<K, List<V>>>
that contains an output
element mapping each distinct key in the input
PCollection
to a List
of the approximate
N
-tiles of the values associated with that key in the
input PCollection
. This gives an idea of the
distribution of the input values for each key.
Each of the computed List
s is of size numQuantiles
,
and contains the input values' minimum value,
numQuantiles-2
intermediate values, and maximum value, in
sorted order, using the given Comparator
to order values.
To compute traditional N
-tiles, one should use
ApproximateQuantiles.perKey(compareFn, N+1)
.
If a key has fewer than numQuantiles
values
associated with it, then that key's output List
will
contain all the key's input values, in sorted order.
The argument Comparator
must be Serializable
.
Example of use:
PCollection<KV<Integer, String>> pc = ...;
PCollection<KV<Integer, List<String>>> quantilesPerKey =
pc.apply(ApproximateQuantiles.<Integer, String>perKey(stringCompareFn, 11));
See Combine.PerKey
for how this affects timestamps and windowing.
K
- the type of the keys in the input and output
PCollection
sV
- the type of the values in the input PCollection
numQuantiles
- the number of elements in the resulting
quantile values List
compareFn
- the function to use to order the elementspublic static <K,V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles)
perKey(int, Comparator)
, but sorts
values using the their natural ordering.K
- the type of the keys in the input and output
PCollection
sV
- the type of the values in the input PCollection
numQuantiles
- the number of elements in the resulting
quantile values List