Class ApproximateDistinct.PerKeyDistinct<K,V>

java.lang.Object
org.apache.beam.sdk.transforms.PTransform<PCollection<KV<K,V>>,PCollection<KV<K,Long>>>
org.apache.beam.sdk.extensions.sketching.ApproximateDistinct.PerKeyDistinct<K,V>
Type Parameters:
K - type of the keys mapping the elements
V - type of the values being combined per key
All Implemented Interfaces:
Serializable, HasDisplayData
Enclosing class:
ApproximateDistinct

public abstract static class ApproximateDistinct.PerKeyDistinct<K,V> extends PTransform<PCollection<KV<K,V>>,PCollection<KV<K,Long>>>
Implementation of ApproximateDistinct.perKey().
See Also:
  • Constructor Details

    • PerKeyDistinct

      public PerKeyDistinct()
  • Method Details

    • withPrecision

      public ApproximateDistinct.PerKeyDistinct<K,V> withPrecision(int p)
      Sets the precision p.

      Keep in mind that p cannot be lower than 4, because the estimation would be too inaccurate.

      See ApproximateDistinct.precisionForRelativeError(double) and ApproximateDistinct.relativeErrorForPrecision(int) to have more information about the relationship between precision and relative error.

      Parameters:
      p - the precision value for the normal representation
    • withSparsePrecision

      public ApproximateDistinct.PerKeyDistinct<K,V> withSparsePrecision(int sp)
      Sets the sparse representation's precision sp.

      Values above 32 are not yet supported by the AddThis version of HyperLogLog+.

      Fore more information about the sparse representation, read Google's paper available here.

      Parameters:
      sp - the precision of HyperLogLog+' sparse representation
    • expand

      public PCollection<KV<K,Long>> expand(PCollection<KV<K,V>> input)
      Description copied from class: PTransform
      Override this method to specify how this PTransform should be expanded on the given InputT.

      NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

      Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

      Specified by:
      expand in class PTransform<PCollection<KV<K,V>>,PCollection<KV<K,Long>>>