apache_beam.utils.histogram module

class apache_beam.utils.histogram.Histogram(bucket_type)[source]

Bases: object

A histogram that supports estimated percentile with linear interpolation.

This class is considered experimental and may break or receive backwards- incompatible changes in future versions of the Apache Beam SDK.

clear()[source]
record(*args)[source]
total_count()[source]
p99()[source]
p90()[source]
p50()[source]
get_percentile_info(elem_type, unit)[source]
get_linear_interpolation(percentile)[source]

Calculate percentile estimation based on linear interpolation.

It first finds the bucket which includes the target percentile and projects the estimated point in the bucket by assuming all the elements in the bucket are uniformly distributed.

Parameters:percentile – The target percentile of the value returning from this method. Should be a floating point number greater than 0 and less than 1.
class apache_beam.utils.histogram.BucketType[source]

Bases: object

range_from()[source]

Lower bound of a starting bucket.

range_to()[source]

Upper bound of an ending bucket.

num_buckets()[source]

The number of buckets.

bucket_index(value)[source]

Get the bucket array index for the given value.

bucket_size(index)[source]

Get the bucket size for the given bucket array index.

accumulated_bucket_size(end_index)[source]

Get the accumulated bucket size from bucket index 0 until endIndex.

Generally, this can be calculated as sigma(0 <= i < endIndex) getBucketSize(i). However, a child class could provide better optimized calculation.

class apache_beam.utils.histogram.LinearBucket(start, width, num_buckets)[source]

Bases: apache_beam.utils.histogram.BucketType

Create a histogram with linear buckets.

Parameters:
  • start – Lower bound of a starting bucket.
  • width – Bucket width. Smaller width implies a better resolution for percentile estimation.
  • num_buckets – The number of buckets. Upper bound of an ending bucket is defined by start + width * numBuckets.
range_from()[source]
range_to()[source]
num_buckets()[source]
bucket_index(value)[source]
bucket_size(index)[source]
accumulated_bucket_size(end_index)[source]