@Experimental public final class TDigestQuantiles extends java.lang.Object
PTransforms for getting information about quantiles in a stream.
 This class uses the T-Digest structure introduced by Ted Dunning, and more precisely
 the MergingDigest implementation.
 
The paper and implementation are available on Ted Dunning's Github profile
Only one parameter can be tuned in order to control the tradeoff between
 the estimation accuracy and the memory use. 
 
Stream elements are compressed into a linked list of centroids.
 The compression factor cf is used to limit the number of elements represented by
 each centroid as well as the total number of centroids. 
 The relative error will always be a small fraction of 1% for values at extreme quantiles
 and always be less than 3/cf at middle quantiles. 
 
By default the compression factor is set to 100, which guarantees a relative error less than 3%.
There are 2 ways of using this class:
PTransforms that return a PCollection which contains
       a MergingDigest for querying the value at a given quantile or
       the approximate quantile position of an element.
   TDigestQuantiles.TDigestQuantilesFn CombineFn that is exposed in order
   to make advanced processing involving the MergingDigest.
 The simplest use is to call the globally() or perKey() method in
 order to retrieve the digest, and then to query the structure.
 
  PCollection<Double> pc = ...;
  PCollection<MergingDigest> countMinSketch = pc.apply(TDigestQuantiles
         .globally()); // .perKey()
 One can tune the compression factor cf in order to control accuracy and memory. 
 This tuning works exactly the same for globally() and perKey().
 
  double cf = 500;
  PCollection<Double> pc = ...;
  PCollection<MergingDigest> countMinSketch = pc.apply(TDigestQuantiles
         .globally() // .perKey()
         .withCompression(cf);
 This example shows how to query the resulting structure, for example to
 build  PCollection of KVs with each pair corresponding to
 a couple (quantile, value).
 
  PCollection<MergingDigest> pc = ...;
  PCollection<KV<Double, Double>> quantiles = pc.apply(ParDo.of(
         new DoFn<MergingDigest, KV<Double, Double>>() {
           @ProcessElement
           public void processElement(ProcessContext c) {
             double[] quantiles = {0.01, 0.25, 0.5, 0.75, 0.99}
             for (double q : quantiles) {
                c.output(KV.of(q, c.element().quantile(q));
             }
           }}));
 One can also retrieve the approximate quantile position of a given element in the stream
 using cdf(double) method instead of quantile(double).
 
The CombineFn does the same thing as the PTransforms but
 it can be used for doing stateful processing or in
 CombineFns.ComposedCombineFn.
 
This example is not really interesting but it shows how one can properly
 create a TDigestQuantiles.TDigestQuantilesFn.
 
  double cf = 250;
  PCollection<Double> input = ...;
  PCollection<MergingDigest> output = input.apply(Combine
         .globally(TDigestQuantilesFn.create(cf)));
 Warning: this class is experimental. 
 Its API is subject to change in future versions of Beam.
| Modifier and Type | Class and Description | 
|---|---|
| static class  | TDigestQuantiles.GlobalDigestImplementation of  globally(). | 
| static class  | TDigestQuantiles.PerKeyDigest<K>Implementation of  perKey(). | 
| static class  | TDigestQuantiles.TDigestQuantilesFnImplements the  Combine.CombineFnofTDigestQuantilestransforms. | 
| Constructor and Description | 
|---|
| TDigestQuantiles() | 
| Modifier and Type | Method and Description | 
|---|---|
| static TDigestQuantiles.GlobalDigest | globally()Compute the stream in order to build a T-Digest structure (MergingDigest)
 for keeping track of the stream distribution and returns a  PCollection<MergingDigest>. | 
| static <K> TDigestQuantiles.PerKeyDigest<K> | perKey()Like  globally(), but builds a digest for each key in the stream. | 
public static TDigestQuantiles.GlobalDigest globally()
PCollection<MergingDigest>.
 public static <K> TDigestQuantiles.PerKeyDigest<K> perKey()
globally(), but builds a digest for each key in the stream.K - the type of the keys