T - the type of the elements of the input and output PCollectionspublic class Distinct<T> extends PTransform<PCollection<T>,PCollection<T>>
Distinct<T> takes a PCollection<T> and returns a PCollection<T> that has
all distinct elements of the input. Thus, each element is unique within each window.
Two values of type T are compared for equality not by regular Java Object.equals(java.lang.Object), but instead by first encoding each of the elements using the PCollection's Coder, and then comparing the encoded bytes. This admits efficient
parallel evaluation.
Optionally, a function may be provided that maps each element to a representative value. In this case, two elements will be considered duplicates if they have equal representative values, with equality being determined as above.
By default, the Coder of the output PCollection is the same as the Coder of the input PCollection.
Each output element is in the same window as its corresponding input element, and has the
timestamp of the end of that window. The output PCollection has the same WindowFn as the input.
Does not preserve any order the input PCollection might have had.
Example of use:
PCollection<String> words = ...;
PCollection<String> uniqueWords =
words.apply(Distinct.<String>create());
| Modifier and Type | Class and Description |
|---|---|
static class |
Distinct.WithRepresentativeValues<T,IdT>
A
Distinct PTransform that uses a SerializableFunction to obtain a
representative value for each input element. |
name, resourceHints| Constructor and Description |
|---|
Distinct() |
| Modifier and Type | Method and Description |
|---|---|
static <T> Distinct<T> |
create()
Returns a
Distinct<T> PTransform. |
PCollection<T> |
expand(PCollection<T> in)
Override this method to specify how this
PTransform should be expanded on the given
InputT. |
static <T,IdT> Distinct.WithRepresentativeValues<T,IdT> |
withRepresentativeValueFn(SerializableFunction<T,IdT> fn)
Returns a
Distinct<T, IdT> PTransform. |
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setResourceHints, toString, validate, validatepublic static <T> Distinct<T> create()
Distinct<T> PTransform.T - the type of the elements of the input and output PCollectionspublic static <T,IdT> Distinct.WithRepresentativeValues<T,IdT> withRepresentativeValueFn(SerializableFunction<T,IdT> fn)
Distinct<T, IdT> PTransform.T - the type of the elements of the input and output PCollectionsIdT - the type of the representative value used to deduppublic PCollection<T> expand(PCollection<T> in)
PTransformPTransform should be expanded on the given
InputT.
NOTE: This method should not be called directly. Instead apply the PTransform should
be applied to the InputT using the apply method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand in class PTransform<PCollection<T>,PCollection<T>>