TDigestQuantiles (Apache Beam 2.5.0)

java.lang.Object
- org.apache.beam.sdk.extensions.sketching.TDigestQuantiles

```
@Experimental
public final class TDigestQuantiles
extends java.lang.Object
```
PTransforms for getting information about quantiles in a stream.
This class uses the T-Digest structure introduced by Ted Dunning, and more precisely the MergingDigest implementation.
References

The paper and implementation are available on Ted Dunning's Github profile
Parameters

Only one parameter can be tuned in order to control the tradeoff between the estimation accuracy and the memory use.

Stream elements are compressed into a linked list of centroids. The compression factor cf is used to limit the number of elements represented by each centroid as well as the total number of centroids.
The relative error will always be a small fraction of 1% for values at extreme quantiles and always be less than 3/cf at middle quantiles.

By default the compression factor is set to 100, which guarantees a relative error less than 3%.
Examples

There are 2 ways of using this class:
- Use the PTransforms that return a PCollection which contains a MergingDigest for querying the value at a given quantile or the approximate quantile position of an element.
- Use the TDigestQuantiles.TDigestQuantilesFn CombineFn that is exposed in order to make advanced processing involving the MergingDigest.
Example 1: Default use

The simplest use is to call the globally() or perKey() method in order to retrieve the digest, and then to query the structure.
```
  PCollection<Double> pc = ...;
  PCollection<MergingDigest> countMinSketch = pc.apply(TDigestQuantiles
         .globally()); // .perKey()
 
```
Example 2: tune accuracy parameters

One can tune the compression factor cf in order to control accuracy and memory.
This tuning works exactly the same for globally() and perKey().
```
  double cf = 500;
  PCollection<Double> pc = ...;
  PCollection<MergingDigest> countMinSketch = pc.apply(TDigestQuantiles
         .globally() // .perKey()
         .withCompression(cf);
 
```
Example 3 : Query the resulting structure

This example shows how to query the resulting structure, for example to build PCollection of KVs with each pair corresponding to a couple (quantile, value).
```
  PCollection<MergingDigest> pc = ...;
  PCollection<KV<Double, Double>> quantiles = pc.apply(ParDo.of(
         new DoFn<MergingDigest, KV<Double, Double>>() {
           @ProcessElement
           public void processElement(ProcessContext c) {
             double[] quantiles = {0.01, 0.25, 0.5, 0.75, 0.99}
             for (double q : quantiles) {
                c.output(KV.of(q, c.element().quantile(q));
             }
           }}));
 
```
One can also retrieve the approximate quantile position of a given element in the stream using cdf(double) method instead of quantile(double).
Example 4: Using the CombineFn

The CombineFn does the same thing as the PTransforms but it can be used for doing stateful processing or in CombineFns.ComposedCombineFn.
This example is not really interesting but it shows how one can properly create a TDigestQuantiles.TDigestQuantilesFn.
```
  double cf = 250;
  PCollection<Double> input = ...;
  PCollection<MergingDigest> output = input.apply(Combine
         .globally(TDigestQuantilesFn.create(cf)));
 
```
Warning: this class is experimental.
Its API is subject to change in future versions of Beam.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`TDigestQuantiles.GlobalDigest` Implementation of `globally()`.
`static class`	`TDigestQuantiles.PerKeyDigest<K>` Implementation of `perKey()`.
`static class`	`TDigestQuantiles.TDigestQuantilesFn` Implements the `Combine.CombineFn` of `TDigestQuantiles` transforms.

Constructor Summary

Constructors
Constructor and Description

TDigestQuantiles()

Constructors
Constructor and Description
`TDigestQuantiles()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static TDigestQuantiles.GlobalDigest`	`globally()` Compute the stream in order to build a T-Digest structure (MergingDigest) for keeping track of the stream distribution and returns a `PCollection<MergingDigest>`.
`static <K> TDigestQuantiles.PerKeyDigest<K>`	`perKey()` Like `globally()`, but builds a digest for each key in the stream.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - TDigestQuantiles
```
public TDigestQuantiles()
```
- Method Detail
  - globally
```
public static TDigestQuantiles.GlobalDigest globally()
```
    Compute the stream in order to build a T-Digest structure (MergingDigest) for keeping track of the stream distribution and returns a PCollection<MergingDigest>.
    The resulting structure can be queried in order to retrieve the approximate value at a given quantile or the approximate quantile position of a given element.
  - perKey
```
public static <K> TDigestQuantiles.PerKeyDigest<K> perKey()
```
    Like globally(), but builds a digest for each key in the stream.
    
    Type Parameters:
    
    K - the type of the keys

Class TDigestQuantiles

References

Parameters

Examples

Example 1: Default use

Example 2: tune accuracy parameters

Example 3 : Query the resulting structure

Example 4: Using the CombineFn

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

TDigestQuantiles

Method Detail

globally

perKey