apache_beam.transforms.stats module

This module has all statistic related transforms.

class apache_beam.transforms.stats.ApproximateUnique[source]

Bases: object

Hashes input elements and uses those to extrapolate the size of the entire set of hash values by assuming the rest of the hash values are as densely distributed as the sample space.

static parse_input_params(size=None, error=None)[source]

Check if input params are valid and return sample size.

Parameters:
  • size – an int not smaller than 16, which we would use to estimate number of unique values.
  • error – max estimation error, which is a float between 0.01 and 0.50. If error is given, sample size will be calculated from error with _get_sample_size_from_est_error function.
Returns:

sample size

Raises:

ValueError: If both size and error are given, or neither is given, or values are out of range.

class Globally(size=None, error=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Approximate.Globally approximate number of unique values

expand(pcoll)[source]
class PerKey(size=None, error=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Approximate.PerKey approximate number of unique values per key

expand(pcoll)[source]