Class ApproximateCountDistinct
java.lang.Object
org.apache.beam.sdk.extensions.zetasketch.ApproximateCountDistinct
PTransform
s for estimating the number of distinct elements in a PCollection
, or
the number of distinct values associated with each key in a PCollection
of KV
s.
We make use of the HllCount
implementation for this transform. Please use HllCount
directly if you need access to the sketches.
If the object is not one of Byte
Integer
Double
String
make
use of ApproximateCountDistinct.Globally.via(org.apache.beam.sdk.transforms.ProcessFunction<T, java.lang.Long>)
or ApproximateCountDistinct.PerKey.via(org.apache.beam.sdk.transforms.ProcessFunction<org.apache.beam.sdk.values.KV<K, V>, org.apache.beam.sdk.values.KV<K, java.lang.Long>>)
Examples
Example 1: Approximate Count of Ints PCollection<Integer>
and specify precision
p.apply("Int", Create.of(ints)).apply("IntHLL", ApproximateCountDistinct.globally()
.withPercision(PRECISION));
Example 2: Approximate Count of Key Value PCollection<KV<Integer,Foo>>
PCollection<KV<Integer, Long>> result =
p.apply("Long", Create.of(longs)).apply("LongHLL", ApproximateCountDistinct.perKey());
Example 3: Approximate Count of Key Value PCollection<KV<Integer,Foo>>
PCollection<KV<Integer, Foo>> approxResultInteger =
p.apply("Int", Create.of(Foo))
.apply("IntHLL", ApproximateCountDistinct.<Integer, KV<Integer, Integer>>perKey()
.via(kv -> KV.of(kv.getKey(), (long) kv.getValue().hashCode())));
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
PTransform
for estimating the number of distinct elements in aPCollection
.static class
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected static <T> HllCount.Init.Builder
<T> builderForType
(TypeDescriptor<T> input) static <T> Combine.CombineFn
<T, ?, byte[]> getUdaf
(TypeDescriptor<T> input) static <T> ApproximateCountDistinct.Globally
<T> globally()
static <K,
V> ApproximateCountDistinct.PerKey <K, V> perKey()
-
Constructor Details
-
ApproximateCountDistinct
public ApproximateCountDistinct()
-
-
Method Details
-
globally
-
perKey
-
getUdaf
-
builderForType
-