public class Top
extends java.lang.Object
PTransform
s for finding the largest (or smallest) set
of elements in a PCollection
, or the largest (or smallest)
set of values associated with each key in a PCollection
of
KV
s.Modifier and Type | Class and Description |
---|---|
static class |
Top.Largest<T extends java.lang.Comparable<? super T>>
A
Serializable Comparator that that uses the compared elements' natural
ordering. |
static class |
Top.Smallest<T extends java.lang.Comparable<? super T>>
Serializable Comparator that that uses the reverse of the compared elements'
natural ordering. |
static class |
Top.TopCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
CombineFn for Top transforms that combines a
bunch of T s into a single count -long
List<T> , using compareFn to choose the largest
T s. |
Modifier and Type | Method and Description |
---|---|
static <T extends java.lang.Comparable<T>> |
largest(int count)
Returns a
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the largest count elements of the input
PCollection<T> , in decreasing order, sorted according to
their natural order. |
static <K,V extends java.lang.Comparable<V>> |
largestPerKey(int count)
Returns a
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the largest count values
associated with that key in the input
PCollection<KV<K, V>> , in decreasing order, sorted
according to their natural order. |
static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> |
of(int count,
ComparatorT compareFn)
Returns a
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the largest count elements of the input
PCollection<T> , in decreasing order, sorted using the
given Comparator<T> . |
static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> |
perKey(int count,
ComparatorT compareFn)
Returns a
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the largest count values
associated with that key in the input
PCollection<KV<K, V>> , in decreasing order, sorted using
the given Comparator<V> . |
static <T extends java.lang.Comparable<T>> |
smallest(int count)
Returns a
PTransform that takes an input
PCollection<T> and returns a PCollection<List<T>> with a
single element containing the smallest count elements of the input
PCollection<T> , in increasing order, sorted according to
their natural order. |
static <K,V extends java.lang.Comparable<V>> |
smallestPerKey(int count)
Returns a
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, List<V>>> that contains an output
element mapping each distinct key in the input
PCollection to the smallest count values
associated with that key in the input
PCollection<KV<K, V>> , in increasing order, sorted
according to their natural order. |
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> Combine.Globally<T,java.util.List<T>> of(int count, ComparatorT compareFn)
PTransform
that takes an input
PCollection<T>
and returns a PCollection<List<T>>
with a
single element containing the largest count
elements of the input
PCollection<T>
, in decreasing order, sorted using the
given Comparator<T>
. The Comparator<T>
must also
be Serializable
.
If count
>
the number of elements in the
input PCollection
, then all the elements of the input
PCollection
will be in the resulting
List
, albeit in sorted order.
All the elements of the result's List
must fit into the memory of a single machine.
Example of use:
PCollection<Student> students = ...;
PCollection<List<Student>> top10Students =
students.apply(Top.of(10, new CompareStudentsByAvgGrade()));
By default, the Coder
of the output PCollection
is a ListCoder
of the Coder
of the elements of
the input PCollection
.
If the input PCollection
is windowed into GlobalWindows
,
an empty List<T>
in the GlobalWindow
will be output if the input
PCollection
is empty. To use this with inputs with other windowing,
either withoutDefaults
or
asSingletonView
must be called.
See also smallest(int)
and largest(int)
, which sort
Comparable
elements using their natural ordering.
See also perKey(int, ComparatorT)
, smallestPerKey(int)
, and
largestPerKey(int)
, which take a PCollection
of
KV
s and return the top values associated with each key.
public static <T extends java.lang.Comparable<T>> Combine.Globally<T,java.util.List<T>> smallest(int count)
PTransform
that takes an input
PCollection<T>
and returns a PCollection<List<T>>
with a
single element containing the smallest count
elements of the input
PCollection<T>
, in increasing order, sorted according to
their natural order.
If count
>
the number of elements in the
input PCollection
, then all the elements of the input
PCollection
will be in the resulting PCollection
's
List
, albeit in sorted order.
All the elements of the result List
must fit into the memory of a single machine.
Example of use:
PCollection<Integer> values = ...;
PCollection<List<Integer>> smallest10Values = values.apply(Top.smallest(10));
By default, the Coder
of the output PCollection
is a ListCoder
of the Coder
of the elements of
the input PCollection
.
If the input PCollection
is windowed into GlobalWindows
,
an empty List<T>
in the GlobalWindow
will be output if the input
PCollection
is empty. To use this with inputs with other windowing,
either withoutDefaults
or
asSingletonView
must be called.
See also largest(int)
.
See also of(int, ComparatorT)
, which sorts using a user-specified
Comparator
function.
See also perKey(int, ComparatorT)
, smallestPerKey(int)
, and
largestPerKey(int)
, which take a PCollection
of
KV
s and return the top values associated with each key.
public static <T extends java.lang.Comparable<T>> Combine.Globally<T,java.util.List<T>> largest(int count)
PTransform
that takes an input
PCollection<T>
and returns a PCollection<List<T>>
with a
single element containing the largest count
elements of the input
PCollection<T>
, in decreasing order, sorted according to
their natural order.
If count
>
the number of elements in the
input PCollection
, then all the elements of the input
PCollection
will be in the resulting PCollection
's
List
, albeit in sorted order.
All the elements of the result's List
must fit into the memory of a single machine.
Example of use:
PCollection<Integer> values = ...;
PCollection<List<Integer>> largest10Values = values.apply(Top.largest(10));
By default, the Coder
of the output PCollection
is a ListCoder
of the Coder
of the elements of
the input PCollection
.
If the input PCollection
is windowed into GlobalWindows
,
an empty List<T>
in the GlobalWindow
will be output if the input
PCollection
is empty. To use this with inputs with other windowing,
either withoutDefaults
or
asSingletonView
must be called.
See also smallest(int)
.
See also of(int, ComparatorT)
, which sorts using a user-specified
Comparator
function.
See also perKey(int, ComparatorT)
, smallestPerKey(int)
, and
largestPerKey(int)
, which take a PCollection
of
KV
s and return the top values associated with each key.
public static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int count, ComparatorT compareFn)
PTransform
that takes an input
PCollection<KV<K, V>>
and returns a
PCollection<KV<K, List<V>>>
that contains an output
element mapping each distinct key in the input
PCollection
to the largest count
values
associated with that key in the input
PCollection<KV<K, V>>
, in decreasing order, sorted using
the given Comparator<V>
. The
Comparator<V>
must also be Serializable
.
If there are fewer than count
values associated with
a particular key, then all those values will be in the result
mapping for that key, albeit in sorted order.
All the values associated with a single key must fit into the
memory of a single machine, but there can be many more
KV
s in the resulting PCollection
than can fit
into the memory of a single machine.
Example of use:
PCollection<KV<School, Student>> studentsBySchool = ...;
PCollection<KV<School, List<Student>>> top10StudentsBySchool =
studentsBySchool.apply(
Top.perKey(10, new CompareStudentsByAvgGrade()));
By default, the Coder
of the keys of the output
PCollection
is the same as that of the keys of the input
PCollection
, and the Coder
of the values of the
output PCollection
is a ListCoder
of the
Coder
of the values of the input PCollection
.
See also smallestPerKey(int)
and largestPerKey(int)
, which
sort Comparable<V>
values using their natural
ordering.
See also of(int, ComparatorT)
, smallest(int)
, and largest(int)
, which
take a PCollection
and return the top elements.
public static <K,V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> smallestPerKey(int count)
PTransform
that takes an input
PCollection<KV<K, V>>
and returns a
PCollection<KV<K, List<V>>>
that contains an output
element mapping each distinct key in the input
PCollection
to the smallest count
values
associated with that key in the input
PCollection<KV<K, V>>
, in increasing order, sorted
according to their natural order.
If there are fewer than count
values associated with
a particular key, then all those values will be in the result
mapping for that key, albeit in sorted order.
All the values associated with a single key must fit into the
memory of a single machine, but there can be many more
KV
s in the resulting PCollection
than can fit
into the memory of a single machine.
Example of use:
PCollection<KV<String, Integer>> keyedValues = ...;
PCollection<KV<String, List<Integer>>> smallest10ValuesPerKey =
keyedValues.apply(Top.smallestPerKey(10));
By default, the Coder
of the keys of the output
PCollection
is the same as that of the keys of the input
PCollection
, and the Coder
of the values of the
output PCollection
is a ListCoder
of the
Coder
of the values of the input PCollection
.
See also largestPerKey(int)
.
See also perKey(int, ComparatorT)
, which sorts values using a user-specified
Comparator
function.
See also of(int, ComparatorT)
, smallest(int)
, and largest(int)
, which
take a PCollection
and return the top elements.
public static <K,V extends java.lang.Comparable<V>> Combine.PerKey<K,V,java.util.List<V>> largestPerKey(int count)
PTransform
that takes an input
PCollection<KV<K, V>>
and returns a
PCollection<KV<K, List<V>>>
that contains an output
element mapping each distinct key in the input
PCollection
to the largest count
values
associated with that key in the input
PCollection<KV<K, V>>
, in decreasing order, sorted
according to their natural order.
If there are fewer than count
values associated with
a particular key, then all those values will be in the result
mapping for that key, albeit in sorted order.
All the values associated with a single key must fit into the
memory of a single machine, but there can be many more
KV
s in the resulting PCollection
than can fit
into the memory of a single machine.
Example of use:
PCollection<KV<String, Integer>> keyedValues = ...;
PCollection<KV<String, List<Integer>>> largest10ValuesPerKey =
keyedValues.apply(Top.largestPerKey(10));
By default, the Coder
of the keys of the output
PCollection
is the same as that of the keys of the input
PCollection
, and the Coder
of the values of the
output PCollection
is a ListCoder
of the
Coder
of the values of the input PCollection
.
See also smallestPerKey(int)
.
See also perKey(int, ComparatorT)
, which sorts values using a user-specified
Comparator
function.
See also of(int, ComparatorT)
, smallest(int)
, and largest(int)
, which
take a PCollection
and return the top elements.