apache_beam.transforms.combiners module

A library of basic combiner PTransform subclasses.

class apache_beam.transforms.combiners.Mean[source]

Bases: future.types.newobject.newobject

Combiners for computing arithmetic means of elements.

class Globally(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

combiners.Mean.Globally computes the arithmetic mean of the elements.

expand(pcoll)[source]
class PerKey(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

combiners.Mean.PerKey finds the means of the values for each key.

expand(pcoll)[source]
class apache_beam.transforms.combiners.Count[source]

Bases: future.types.newobject.newobject

Combiners for counting elements.

class Globally(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

combiners.Count.Globally counts the total number of elements.

expand(pcoll)[source]
class PerKey(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

combiners.Count.PerKey counts how many elements each unique key has.

expand(pcoll)[source]
class PerElement(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

combiners.Count.PerElement counts how many times each element occurs.

expand(pcoll)[source]
class apache_beam.transforms.combiners.Top[source]

Bases: future.types.newobject.newobject

Combiners for obtaining extremal elements.

class Of(n, compare=None, *args, **kwargs)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Obtain a list of the compare-most N elements in a PCollection.

This transform will retrieve the n greatest elements in the PCollection to which it is applied, where “greatest” is determined by the comparator function supplied as the compare argument.

Initializer.

compare should be an implementation of “a < b” taking at least two arguments (a and b). Additional arguments and side inputs specified in the apply call become additional arguments to the comparator. Defaults to the natural ordering of the elements. The arguments ‘key’ and ‘reverse’ may instead be passed as keyword arguments, and have the same meaning as for Python’s sort functions.

Parameters:
  • pcoll – PCollection to process.
  • n – number of elements to extract from pcoll.
  • compare – as described above.
  • *args – as described above.
  • **kwargs – as described above.
default_label()[source]
expand(pcoll)[source]
class PerKey(n, compare=None, *args, **kwargs)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Identifies the compare-most N elements associated with each key.

This transform will produce a PCollection mapping unique keys in the input PCollection to the n greatest elements with which they are associated, where “greatest” is determined by the comparator function supplied as the compare argument in the initializer.

Initializer.

compare should be an implementation of “a < b” taking at least two arguments (a and b). Additional arguments and side inputs specified in the apply call become additional arguments to the comparator. Defaults to the natural ordering of the elements.

The arguments ‘key’ and ‘reverse’ may instead be passed as keyword arguments, and have the same meaning as for Python’s sort functions.

Parameters:
  • n – number of elements to extract from input.
  • compare – as described above.
  • *args – as described above.
  • **kwargs – as described above.
default_label()[source]
expand(pcoll)[source]

Expands the transform.

Raises TypeCheckError: If the output type of the input PCollection is not compatible with Tuple[A, B].

Parameters:pcoll – PCollection to process
Returns:the PCollection containing the result.
static Largest(pcoll, n)[source]

Obtain a list of the greatest N elements in a PCollection.

static Smallest(pcoll, n)[source]

Obtain a list of the least N elements in a PCollection.

static LargestPerKey(pcoll, n)[source]

Identifies the N greatest elements associated with each key.

static SmallestPerKey(pcoll, n, reverse=True)[source]

Identifies the N least elements associated with each key.

class apache_beam.transforms.combiners.Sample[source]

Bases: future.types.newobject.newobject

Combiners for sampling n elements without replacement.

class FixedSizeGlobally(n)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Sample n elements from the input PCollection without replacement.

expand(pcoll)[source]
display_data()[source]
default_label()[source]
class FixedSizePerKey(n)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Sample n elements associated with each key without replacement.

expand(pcoll)[source]
display_data()[source]
default_label()[source]
class apache_beam.transforms.combiners.ToList(label='ToList')[source]

Bases: apache_beam.transforms.ptransform.PTransform

A global CombineFn that condenses a PCollection into a single list.

expand(pcoll)[source]
class apache_beam.transforms.combiners.ToDict(label='ToDict')[source]

Bases: apache_beam.transforms.ptransform.PTransform

A global CombineFn that condenses a PCollection into a single dict.

PCollections should consist of 2-tuples, notionally (key, value) pairs. If multiple values are associated with the same key, only one of the values will be present in the resulting dict.

expand(pcoll)[source]
class apache_beam.transforms.combiners.Latest[source]

Bases: future.types.newobject.newobject

Combiners for computing the latest element

class Globally(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Compute the element with the latest timestamp from a PCollection.

static add_timestamp(element, timestamp=TimestampParam)[source]
expand(pcoll)[source]
class PerKey(label=None)[source]

Bases: apache_beam.transforms.ptransform.PTransform

Compute elements with the latest timestamp for each key from a keyed PCollection

static add_timestamp(element, timestamp=TimestampParam)[source]
expand(pcoll)[source]