apache_beam.transforms.deduplicate module¶
a collection of ptransforms for deduplicating elements.
- 
class apache_beam.transforms.deduplicate.DeduplicatePerKey(processing_time_duration=None, event_time_duration=None)[source]¶
- Bases: - apache_beam.transforms.ptransform.PTransform- A PTransform which deduplicates <key, value> pair over a time domain and threshold. Values in different windows will NOT be considered duplicates of each other. Deduplication is guaranteed with respect of time domain and duration. - Time durations are required so as to avoid unbounded memory and/or storage requirements within a runner and care might need to be used to ensure that the deduplication time limit is long enough to remove duplicates but short enough to not cause performance problems within a runner. Each runner may provide an optimized implementation of their choice using the deduplication time domain and threshold specified. - Does not preserve any order the input PCollection might have had. 
- 
class apache_beam.transforms.deduplicate.Deduplicate(processing_time_duration=None, event_time_duration=None)[source]¶
- Bases: - apache_beam.transforms.ptransform.PTransform- Similar to DeduplicatePerKey, the Deduplicate transform takes any arbitrary value as input and uses value as key to deduplicate among certain amount of time duration.