apache_beam.transforms.stats module¶
This module has all statistic related transforms.
-
class
apache_beam.transforms.stats.
ApproximateUnique
[source]¶ Bases:
object
Hashes input elements and uses those to extrapolate the size of the entire set of hash values by assuming the rest of the hash values are as densely distributed as the sample space.
-
static
parse_input_params
(size=None, error=None)[source]¶ Check if input params are valid and return sample size.
Parameters: - size – an int not smaller than 16, which we would use to estimate number of unique values.
- error – max estimation error, which is a float between 0.01 and 0.50. If error is given, sample size will be calculated from error with _get_sample_size_from_est_error function.
Returns: sample size
Raises: ValueError: If both size and error are given, or neither is given, or values are out of range.
-
class
Globally
(size=None, error=None)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
Approximate.Globally approximate number of unique values
-
class
PerKey
(size=None, error=None)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
Approximate.PerKey approximate number of unique values per key
-
static
-
class
apache_beam.transforms.stats.
ApproximateQuantiles
[source]¶ Bases:
object
PTransfrom for getting the idea of data distribution using approximate N-tile (e.g. quartiles, percentiles etc.) either globally or per-key.
-
class
Globally
(num_quantiles, key=None, reverse=False)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
PTransform takes PCollection and returns a list whose single value is approximate N-tiles of the input collection globally.
Parameters: - num_quantiles – number of elements in the resulting quantiles values list.
- key – (optional) Key is a mapping of elements to a comparable key, similar to the key argument of Python’s sorting methods.
- reverse – (optional) whether to order things smallest to largest, rather than largest to smallest
-
class
PerKey
(num_quantiles, key=None, reverse=False)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
PTransform takes PCollection of KV and returns a list based on each key whose single value is list of approximate N-tiles of the input element of the key.
Parameters: - num_quantiles – number of elements in the resulting quantiles values list.
- key – (optional) Key is a mapping of elements to a comparable key, similar to the key argument of Python’s sorting methods.
- reverse – (optional) whether to order things smallest to largest, rather than largest to smallest
-
class