public class Top
extends java.lang.Object
PTransforms for finding the largest (or smallest) set
of elements in a PCollection, or the largest (or smallest)
set of values associated with each key in a PCollection of
KVs.| Modifier and Type | Class and Description |
|---|---|
static class |
Top.Largest<T extends java.lang.Comparable<? super T>>
Deprecated.
use
Top.Natural instead |
static class |
Top.Natural<T extends java.lang.Comparable<? super T>>
A
Serializable Comparator that that uses the compared elements' natural
ordering. |
static class |
Top.Reversed<T extends java.lang.Comparable<? super T>>
Serializable Comparator that that uses the reverse of the compared elements'
natural ordering. |
static class |
Top.Smallest<T extends java.lang.Comparable<? super T>>
Deprecated.
use
Top.Reversed instead |
static class |
Top.TopCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
CombineFn for Top transforms that combines a
bunch of Ts into a single count-long
List<T>, using compareFn to choose the largest
Ts. |
| Modifier and Type | Method and Description |
|---|---|
static <T extends java.lang.Comparable<T>> |
largest(int count)
Returns a
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the largest count elements of the input
PCollection<T>, in decreasing order, sorted according to
their natural order. |
static <K,V extends java.lang.Comparable<V>> |
largestPerKey(int count)
Returns a
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the largest count values
associated with that key in the input
PCollection<KV<K, V>>, in decreasing order, sorted
according to their natural order. |
static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
of(int count,
ComparatorT compareFn)
Returns a
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the largest count elements of the input
PCollection<T>, in decreasing order, sorted using the
given Comparator<T>. |
static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> |
perKey(int count,
ComparatorT compareFn)
Returns a
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the largest count values
associated with that key in the input
PCollection<KV<K, V>>, in decreasing order, sorted using
the given Comparator<V>. |
static <T extends java.lang.Comparable<T>> |
smallest(int count)
Returns a
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the smallest count elements of the input
PCollection<T>, in increasing order, sorted according to
their natural order. |
static <K,V extends java.lang.Comparable<V>> |
smallestPerKey(int count)
Returns a
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the smallest count values
associated with that key in the input
PCollection<KV<K, V>>, in increasing order, sorted
according to their natural order. |
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> Combine.Globally<T,java.util.List<T>> of(int count, ComparatorT compareFn)
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the largest count elements of the input
PCollection<T>, in decreasing order, sorted using the
given Comparator<T>. The Comparator<T> must also
be Serializable.
If count > the number of elements in the
input PCollection, then all the elements of the input
PCollection will be in the resulting
List, albeit in sorted order.
All the elements of the result's List
must fit into the memory of a single machine.
Example of use:
PCollection<Student> students = ...;
PCollection<List<Student>> top10Students =
students.apply(Top.of(10, new CompareStudentsByAvgGrade()));
By default, the Coder of the output PCollection
is a ListCoder of the Coder of the elements of
the input PCollection.
If the input PCollection is windowed into GlobalWindows,
an empty List<T> in the GlobalWindow will be output if the input
PCollection is empty. To use this with inputs with other windowing,
either withoutDefaults or
asSingletonView must be called.
See also smallest(int) and largest(int), which sort
Comparable elements using their natural ordering.
See also perKey(int, ComparatorT), smallestPerKey(int), and
largestPerKey(int), which take a PCollection of
KVs and return the top values associated with each key.
public static <T extends java.lang.Comparable<T>> Combine.Globally<T,java.util.List<T>> smallest(int count)
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the smallest count elements of the input
PCollection<T>, in increasing order, sorted according to
their natural order.
If count > the number of elements in the
input PCollection, then all the elements of the input
PCollection will be in the resulting PCollection's
List, albeit in sorted order.
All the elements of the result List
must fit into the memory of a single machine.
Example of use:
PCollection<Integer> values = ...;
PCollection<List<Integer>> smallest10Values = values.apply(Top.smallest(10));
By default, the Coder of the output PCollection
is a ListCoder of the Coder of the elements of
the input PCollection.
If the input PCollection is windowed into GlobalWindows,
an empty List<T> in the GlobalWindow will be output if the input
PCollection is empty. To use this with inputs with other windowing,
either withoutDefaults or
asSingletonView must be called.
See also largest(int).
See also of(int, ComparatorT), which sorts using a user-specified
Comparator function.
See also perKey(int, ComparatorT), smallestPerKey(int), and
largestPerKey(int), which take a PCollection of
KVs and return the top values associated with each key.
public static <T extends java.lang.Comparable<T>> Combine.Globally<T,java.util.List<T>> largest(int count)
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the largest count elements of the input
PCollection<T>, in decreasing order, sorted according to
their natural order.
If count > the number of elements in the
input PCollection, then all the elements of the input
PCollection will be in the resulting PCollection's
List, albeit in sorted order.
All the elements of the result's List
must fit into the memory of a single machine.
Example of use:
PCollection<Integer> values = ...;
PCollection<List<Integer>> largest10Values = values.apply(Top.largest(10));
By default, the Coder of the output PCollection
is a ListCoder of the Coder of the elements of
the input PCollection.
If the input PCollection is windowed into GlobalWindows,
an empty List<T> in the GlobalWindow will be output if the input
PCollection is empty. To use this with inputs with other windowing,
either withoutDefaults or
asSingletonView must be called.
See also smallest(int).
See also of(int, ComparatorT), which sorts using a user-specified
Comparator function.
See also perKey(int, ComparatorT), smallestPerKey(int), and
largestPerKey(int), which take a PCollection of
KVs and return the top values associated with each key.
public static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int count, ComparatorT compareFn)
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the largest count values
associated with that key in the input
PCollection<KV<K, V>>, in decreasing order, sorted using
the given Comparator<V>. The
Comparator<V> must also be Serializable.
If there are fewer than count values associated with
a particular key, then all those values will be in the result
mapping for that key, albeit in sorted order.
All the values associated with a single key must fit into the
memory of a single machine, but there can be many more
KVs in the resulting PCollection than can fit
into the memory of a single machine.
Example of use:
PCollection<KV<School, Student>> studentsBySchool = ...;
PCollection<KV<School, List<Student>>> top10StudentsBySchool =
studentsBySchool.apply(
Top.perKey(10, new CompareStudentsByAvgGrade()));
By default, the Coder of the keys of the output
PCollection is the same as that of the keys of the input
PCollection, and the Coder of the values of the
output PCollection is a ListCoder of the
Coder of the values of the input PCollection.
See also smallestPerKey(int) and largestPerKey(int), which
sort Comparable<V> values using their natural
ordering.
See also of(int, ComparatorT), smallest(int), and largest(int), which
take a PCollection and return the top elements.
public static <K,V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> smallestPerKey(int count)
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the smallest count values
associated with that key in the input
PCollection<KV<K, V>>, in increasing order, sorted
according to their natural order.
If there are fewer than count values associated with
a particular key, then all those values will be in the result
mapping for that key, albeit in sorted order.
All the values associated with a single key must fit into the
memory of a single machine, but there can be many more
KVs in the resulting PCollection than can fit
into the memory of a single machine.
Example of use:
PCollection<KV<String, Integer>> keyedValues = ...;
PCollection<KV<String, List<Integer>>> smallest10ValuesPerKey =
keyedValues.apply(Top.smallestPerKey(10));
By default, the Coder of the keys of the output
PCollection is the same as that of the keys of the input
PCollection, and the Coder of the values of the
output PCollection is a ListCoder of the
Coder of the values of the input PCollection.
See also largestPerKey(int).
See also perKey(int, ComparatorT), which sorts values using a user-specified
Comparator function.
See also of(int, ComparatorT), smallest(int), and largest(int), which
take a PCollection and return the top elements.
public static <K,V extends java.lang.Comparable<V>> Combine.PerKey<K,V,java.util.List<V>> largestPerKey(int count)
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the largest count values
associated with that key in the input
PCollection<KV<K, V>>, in decreasing order, sorted
according to their natural order.
If there are fewer than count values associated with
a particular key, then all those values will be in the result
mapping for that key, albeit in sorted order.
All the values associated with a single key must fit into the
memory of a single machine, but there can be many more
KVs in the resulting PCollection than can fit
into the memory of a single machine.
Example of use:
PCollection<KV<String, Integer>> keyedValues = ...;
PCollection<KV<String, List<Integer>>> largest10ValuesPerKey =
keyedValues.apply(Top.largestPerKey(10));
By default, the Coder of the keys of the output
PCollection is the same as that of the keys of the input
PCollection, and the Coder of the values of the
output PCollection is a ListCoder of the
Coder of the values of the input PCollection.
See also smallestPerKey(int).
See also perKey(int, ComparatorT), which sorts values using a user-specified
Comparator function.
See also of(int, ComparatorT), smallest(int), and largest(int), which
take a PCollection and return the top elements.