T
- the type of the elements of the input and output
PCollection
spublic class Distinct<T> extends PTransform<PCollection<T>,PCollection<T>>
Distinct<T>
takes a PCollection<T>
and
returns a PCollection<T>
that has all distinct elements of the
input. Thus, each element is unique within each window.
Two values of type T
are compared for equality not by
regular Java Object.equals(java.lang.Object)
, but instead by first encoding
each of the elements using the PCollection
's Coder
, and then
comparing the encoded bytes. This admits efficient parallel
evaluation.
Optionally, a function may be provided that maps each element to a representative value. In this case, two elements will be considered duplicates if they have equal representative values, with equality being determined as above.
By default, the Coder
of the output PCollection
is the same as the Coder
of the input PCollection
.
Each output element is in the same window as its corresponding input
element, and has the timestamp of the end of that window. The output
PCollection
has the same
WindowFn
as the input.
Does not preserve any order the input PCollection might have had.
Example of use:
PCollection<String> words = ...;
PCollection<String> uniqueWords =
words.apply(Distinct.<String>create());
Modifier and Type | Class and Description |
---|---|
static class |
Distinct.WithRepresentativeValues<T,IdT>
A
Distinct PTransform that uses a SerializableFunction to
obtain a representative value for each input element. |
name
Constructor and Description |
---|
Distinct() |
Modifier and Type | Method and Description |
---|---|
static <T> Distinct<T> |
create()
Returns a
Distinct<T> PTransform . |
PCollection<T> |
expand(PCollection<T> in)
Applies this
PTransform on the given InputT , and returns its
Output . |
static <T,IdT> Distinct.WithRepresentativeValues<T,IdT> |
withRepresentativeValueFn(SerializableFunction<T,IdT> fn)
Returns a
Distinct<T, IdT> PTransform . |
getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
public static <T> Distinct<T> create()
Distinct<T>
PTransform
.T
- the type of the elements of the input and output
PCollection
spublic static <T,IdT> Distinct.WithRepresentativeValues<T,IdT> withRepresentativeValueFn(SerializableFunction<T,IdT> fn)
Distinct<T, IdT>
PTransform
.T
- the type of the elements of the input and output
PCollection
sIdT
- the type of the representative value used to deduppublic PCollection<T> expand(PCollection<T> in)
PTransform
PTransform
on the given InputT
, and returns its
Output
.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<T>,PCollection<T>>