Class Top

java.lang.Object
org.apache.beam.sdk.transforms.Top

public class Top extends Object
PTransforms for finding the largest (or smallest) set of elements in a PCollection, or the largest (or smallest) set of values associated with each key in a PCollection of KVs.
  • Method Details

    • of

      public static <T, ComparatorT extends Comparator<T> & Serializable> Combine.Globally<T,List<T>> of(int count, ComparatorT compareFn)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted using the given Comparator<T>. The Comparator<T> must also be Serializable.

      If count > the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting List, albeit in sorted order.

      All the elements of the result's List must fit into the memory of a single machine.

      Example of use:

      
       PCollection<Student> students = ...;
       PCollection<List<Student>> top10Students =
           students.apply(Top.of(10, new CompareStudentsByAvgGrade()));
       

      By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

      If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

      See also smallest(int) and largest(int), which sort Comparable elements using their natural ordering.

      See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

    • smallest

      public static <T extends Comparable<T>> Combine.Globally<T,List<T>> smallest(int count)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the smallest count elements of the input PCollection<T>, in increasing order, sorted according to their natural order.

      If count > the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting PCollection's List, albeit in sorted order.

      All the elements of the result List must fit into the memory of a single machine.

      Example of use:

      
       PCollection<Integer> values = ...;
       PCollection<List<Integer>> smallest10Values = values.apply(Top.smallest(10));
       

      By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

      If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

      See also largest(int).

      See also of(int, ComparatorT), which sorts using a user-specified Comparator function.

      See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

    • largest

      public static <T extends Comparable<T>> Combine.Globally<T,List<T>> largest(int count)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted according to their natural order.

      If count > the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting PCollection's List, albeit in sorted order.

      All the elements of the result's List must fit into the memory of a single machine.

      Example of use:

      
       PCollection<Integer> values = ...;
       PCollection<List<Integer>> largest10Values = values.apply(Top.largest(10));
       

      By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

      If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

      See also smallest(int).

      See also of(int, ComparatorT), which sorts using a user-specified Comparator function.

      See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

    • largestFn

      public static <T extends Comparable<T>> Top.TopCombineFn<T,Top.Natural<T>> largestFn(int count)
      Returns a Top.TopCombineFn that aggregates the largest count values.
    • largestLongsFn

      public static Top.TopCombineFn<Long,Top.Natural<Long>> largestLongsFn(int count)
      Returns a Top.TopCombineFn that aggregates the largest count long values.
    • largestIntsFn

      public static Top.TopCombineFn<Integer,Top.Natural<Integer>> largestIntsFn(int count)
      Returns a Top.TopCombineFn that aggregates the largest count int values.
    • largestDoublesFn

      public static Top.TopCombineFn<Double,Top.Natural<Double>> largestDoublesFn(int count)
      Returns a Top.TopCombineFn that aggregates the largest count double values.
    • smallestFn

      public static <T extends Comparable<T>> Top.TopCombineFn<T,Top.Reversed<T>> smallestFn(int count)
      Returns a Top.TopCombineFn that aggregates the smallest count values.
    • smallestLongsFn

      public static Top.TopCombineFn<Long,Top.Reversed<Long>> smallestLongsFn(int count)
      Returns a Top.TopCombineFn that aggregates the smallest count long values.
    • smallestIntsFn

      public static Top.TopCombineFn<Integer,Top.Reversed<Integer>> smallestIntsFn(int count)
      Returns a Top.TopCombineFn that aggregates the smallest count int values.
    • smallestDoublesFn

      public static Top.TopCombineFn<Double,Top.Reversed<Double>> smallestDoublesFn(int count)
      Returns a Top.TopCombineFn that aggregates the smallest count double values.
    • perKey

      public static <K, V, ComparatorT extends Comparator<V> & Serializable> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,List<V>>>> perKey(int count, ComparatorT compareFn)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted using the given Comparator<V>. The Comparator<V> must also be Serializable.

      If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

      All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

      Example of use:

      
       PCollection<KV<School, Student>> studentsBySchool = ...;
       PCollection<KV<School, List<Student>>> top10StudentsBySchool =
           studentsBySchool.apply(
               Top.perKey(10, new CompareStudentsByAvgGrade()));
       

      By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

      See also smallestPerKey(int) and largestPerKey(int), which sort Comparable<V> values using their natural ordering.

      See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.

    • smallestPerKey

      public static <K, V extends Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,List<V>>>> smallestPerKey(int count)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the smallest count values associated with that key in the input PCollection<KV<K, V>>, in increasing order, sorted according to their natural order.

      If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

      All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

      Example of use:

      
       PCollection<KV<String, Integer>> keyedValues = ...;
       PCollection<KV<String, List<Integer>>> smallest10ValuesPerKey =
           keyedValues.apply(Top.smallestPerKey(10));
       

      By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

      See also largestPerKey(int).

      See also perKey(int, ComparatorT), which sorts values using a user-specified Comparator function.

      See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.

    • largestPerKey

      public static <K, V extends Comparable<V>> Combine.PerKey<K,V,List<V>> largestPerKey(int count)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted according to their natural order.

      If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

      All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

      Example of use:

      
       PCollection<KV<String, Integer>> keyedValues = ...;
       PCollection<KV<String, List<Integer>>> largest10ValuesPerKey =
           keyedValues.apply(Top.largestPerKey(10));
       

      By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

      See also smallestPerKey(int).

      See also perKey(int, ComparatorT), which sorts values using a user-specified Comparator function.

      See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.