Class ApproximateDistinct.ApproximateDistinctFn<InputT>

java.lang.Object
org.apache.beam.sdk.transforms.Combine.CombineFn<InputT,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus>
org.apache.beam.sdk.extensions.sketching.ApproximateDistinct.ApproximateDistinctFn<InputT>
Type Parameters:
InputT - the type of the elements in the input PCollection
All Implemented Interfaces:
Serializable, CombineFnBase.GlobalCombineFn<InputT,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus>, HasDisplayData
Enclosing class:
ApproximateDistinct

public static class ApproximateDistinct.ApproximateDistinctFn<InputT> extends Combine.CombineFn<InputT,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus>
Implements the Combine.CombineFn of ApproximateDistinct transforms.
See Also:
  • Method Details

    • create

      public static <InputT> ApproximateDistinct.ApproximateDistinctFn<InputT> create(Coder<InputT> coder)
      Returns an ApproximateDistinct.ApproximateDistinctFn combiner with the given input coder.
      Parameters:
      coder - the coder that encodes the elements' type
    • withPrecision

      public ApproximateDistinct.ApproximateDistinctFn<InputT> withPrecision(int p)
      Returns an ApproximateDistinct.ApproximateDistinctFn combiner with a new precision p.

      Keep in mind that p cannot be lower than 4, because the estimation would be too inaccurate.

      See ApproximateDistinct.precisionForRelativeError(double) and ApproximateDistinct.relativeErrorForPrecision(int) to have more information about the relationship between precision and relative error.

      Parameters:
      p - the precision value for the normal representation
    • withSparseRepresentation

      public ApproximateDistinct.ApproximateDistinctFn<InputT> withSparseRepresentation(int sp)
      Returns an ApproximateDistinct.ApproximateDistinctFn combiner with a new sparse representation's precision sp.

      Values above 32 are not yet supported by the AddThis version of HyperLogLog+.

      Fore more information about the sparse representation, read Google's paper available here.

      Parameters:
      sp - the precision of HyperLogLog+' sparse representation
    • createAccumulator

      public com.clearspring.analytics.stream.cardinality.HyperLogLogPlus createAccumulator()
      Description copied from class: Combine.CombineFn
      Returns a new, mutable accumulator value, representing the accumulation of zero input values.
      Specified by:
      createAccumulator in class Combine.CombineFn<InputT,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus>
    • addInput

      public com.clearspring.analytics.stream.cardinality.HyperLogLogPlus addInput(com.clearspring.analytics.stream.cardinality.HyperLogLogPlus acc, InputT record)
      Description copied from class: Combine.CombineFn
      Adds the given input value to the given accumulator, returning the new accumulator value.
      Specified by:
      addInput in class Combine.CombineFn<InputT,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus>
      Parameters:
      acc - may be modified and returned for efficiency
      record - should not be mutated
    • extractOutput

      public com.clearspring.analytics.stream.cardinality.HyperLogLogPlus extractOutput(com.clearspring.analytics.stream.cardinality.HyperLogLogPlus accumulator)
      Output the whole structure so it can be queried, reused or stored easily.
      Specified by:
      extractOutput in class Combine.CombineFn<InputT,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus>
      Parameters:
      accumulator - can be modified for efficiency
    • mergeAccumulators

      public com.clearspring.analytics.stream.cardinality.HyperLogLogPlus mergeAccumulators(Iterable<com.clearspring.analytics.stream.cardinality.HyperLogLogPlus> accumulators)
      Description copied from class: Combine.CombineFn
      Returns an accumulator representing the accumulation of all the input values accumulated in the merging accumulators.
      Specified by:
      mergeAccumulators in class Combine.CombineFn<InputT,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus,com.clearspring.analytics.stream.cardinality.HyperLogLogPlus>
      Parameters:
      accumulators - only the first accumulator may be modified and returned for efficiency; the other accumulators should not be mutated, because they may be shared with other code and mutating them could lead to incorrect results or data corruption.
    • populateDisplayData

      public void populateDisplayData(DisplayData.Builder builder)
      Register display data for the given transform or component.

      populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

      By default, does not register any display data. Implementors may override this method to provide their own display data.

      Specified by:
      populateDisplayData in interface HasDisplayData
      Parameters:
      builder - The builder to populate with display data.
      See Also:
    • getAccumulatorCoder

      public Coder<com.clearspring.analytics.stream.cardinality.HyperLogLogPlus> getAccumulatorCoder(CoderRegistry registry, Coder<InputT> inputCoder) throws CannotProvideCoderException
      Description copied from interface: CombineFnBase.GlobalCombineFn
      Returns the Coder to use for accumulator AccumT values, or null if it is not able to be inferred.

      By default, uses the knowledge of the Coder being used for InputT values and the enclosing Pipeline's CoderRegistry to try to infer the Coder for AccumT values.

      This is the Coder used to send data through a communication-intensive shuffle step, so a compact and efficient representation may have significant performance benefits.

      Specified by:
      getAccumulatorCoder in interface CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
      Throws:
      CannotProvideCoderException
    • getDefaultOutputCoder

      public Coder<com.clearspring.analytics.stream.cardinality.HyperLogLogPlus> getDefaultOutputCoder(CoderRegistry registry, Coder<InputT> inputCoder) throws CannotProvideCoderException
      Description copied from interface: CombineFnBase.GlobalCombineFn
      Returns the Coder to use by default for output OutputT values, or null if it is not able to be inferred.

      By default, uses the knowledge of the Coder being used for input InputT values and the enclosing Pipeline's CoderRegistry to try to infer the Coder for OutputT values.

      Specified by:
      getDefaultOutputCoder in interface CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
      Throws:
      CannotProvideCoderException
    • getIncompatibleGlobalWindowErrorMessage

      public String getIncompatibleGlobalWindowErrorMessage()
      Description copied from interface: CombineFnBase.GlobalCombineFn
      Returns the error message for not supported default values in Combine.globally().
      Specified by:
      getIncompatibleGlobalWindowErrorMessage in interface CombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
    • getInputTVariable

      public TypeVariable<?> getInputTVariable()
      Returns the TypeVariable of InputT.
    • getAccumTVariable

      public TypeVariable<?> getAccumTVariable()
      Returns the TypeVariable of AccumT.
    • getOutputTVariable

      public TypeVariable<?> getOutputTVariable()
      Returns the TypeVariable of OutputT.