Class Distinct<T>
- Type Parameters:
T
- the type of the elements of the input and outputPCollection
s
- All Implemented Interfaces:
Serializable
,HasDisplayData
Distinct<T>
takes a PCollection<T>
and returns a PCollection<T>
that has
all distinct elements of the input. Thus, each element is unique within each window.
Two values of type T
are compared for equality not by regular Java Object.equals(java.lang.Object)
, but instead by first encoding each of the elements using the
PCollection
's Coder
, and then comparing the encoded bytes. This admits efficient
parallel evaluation.
Optionally, a function may be provided that maps each element to a representative value. In this case, two elements will be considered duplicates if they have equal representative values, with equality being determined as above.
By default, the Coder
of the output PCollection
is the same as the
Coder
of the input PCollection
.
Each output element is in the same window as its corresponding input element, and has the
timestamp of the end of that window. The output PCollection
has the same WindowFn
as the input.
Does not preserve any order the input PCollection might have had.
Example of use:
PCollection<String> words = ...;
PCollection<String> uniqueWords =
words.apply(Distinct.<String>create());
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
ADistinct
PTransform
that uses aSerializableFunction
to obtain a representative value for each input element. -
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic <T> Distinct
<T> create()
Returns aDistinct<T>
PTransform
.expand
(PCollection<T> in) Override this method to specify how thisPTransform
should be expanded on the givenInputT
.static <T,
IdT> Distinct.WithRepresentativeValues <T, IdT> withRepresentativeValueFn
(SerializableFunction<T, IdT> fn) Returns aDistinct<T, IdT>
PTransform
.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
-
Constructor Details
-
Distinct
public Distinct()
-
-
Method Details
-
create
Returns aDistinct<T>
PTransform
.- Type Parameters:
T
- the type of the elements of the input and outputPCollection
s
-
withRepresentativeValueFn
public static <T,IdT> Distinct.WithRepresentativeValues<T,IdT> withRepresentativeValueFn(SerializableFunction<T, IdT> fn) Returns aDistinct<T, IdT>
PTransform
.- Type Parameters:
T
- the type of the elements of the input and outputPCollection
sIdT
- the type of the representative value used to dedup
-
expand
Description copied from class:PTransform
Override this method to specify how thisPTransform
should be expanded on the givenInputT
.NOTE: This method should not be called directly. Instead apply the
PTransform
should be applied to theInputT
using theapply
method.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expand
in classPTransform<PCollection<T>,
PCollection<T>>
-