Class Sets
PTransforms that allow to compute different set functions across PCollections.
They come in two variants. 1. Between two PCollection 2. Between two or more PCollection in a PCollectionList.
Following PTransforms follows SET DISTINCT semantics: intersectDistinct,
expectDistinct, unionDistinct
Following PTransforms follows SET ALL semantics: intersectAll, expectAll, unionAll
For example, the following demonstrates intersectDistinct between two collections PCollections.
Pipeline p = ...;
PCollection<String> left = p.apply(Create.of("1", "2", "3", "3", "4", "5"));
PCollection<String> right = p.apply(Create.of("1", "3", "4", "4", "6"));
PCollection<String> results =
left.apply(SetFns.intersectDistinct(right)); // results will be PCollection<String> containing: "1","3","4"
For example, the following demonstrates intersectDistinct between three collections PCollections in a PCollectionList.
Pipeline p = ...;
PCollection<String> first = p.apply(Create.of("1", "2", "3", "3", "4", "5"));
PCollection<String> second = p.apply(Create.of("1", "3", "4", "4", "6"));
PCollection<String> third = p.apply(Create.of("3", "4", "4"));
// Following example will perform (first intersect second) intersect third.
PCollection<String> results =
PCollectionList.of(first).and(second).and(third)
.apply(SetFns.intersectDistinct()); // results will be PCollection<String> containing: "3","4"
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic <T> PTransform<PCollectionList<T>, PCollection<T>> Returns a newPTransformtransform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the difference all (exceptAll) of collections done in order for all collections inPCollectionList<T>.static <T> PTransform<PCollection<T>, PCollection<T>> exceptAll(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET ALL semantics to compute the difference all (exceptAll) with providedPCollection<T>.static <T> PTransform<PCollectionList<T>, PCollection<T>> Returns aPTransformthat takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the difference (except) of collections done in order for all collections inPCollectionList<T>.static <T> PTransform<PCollection<T>, PCollection<T>> exceptDistinct(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET DISTINCT semantics to compute the difference (except) with providedPCollection<T>.static <T> PTransform<PCollectionList<T>, PCollection<T>> Returns a newPTransformtransform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the intersection all of collections done in order for all collections inPCollectionList<T>.static <T> PTransform<PCollection<T>, PCollection<T>> intersectAll(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET ALL semantics to compute the intersection with providedPCollection<T>.static <T> PTransform<PCollectionList<T>, PCollection<T>> Returns aPTransformthat takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the intersection of collections done in order for all collections inPCollectionList<T>.static <T> PTransform<PCollection<T>, PCollection<T>> intersectDistinct(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET DISTINCT semantics to compute the intersection with providedPCollection<T>.static <T> Flatten.PCollections<T> unionAll()Returns a newPTransformtransform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the unionAll of collections done in order for all collections inPCollectionList<T>.static <T> PTransform<PCollection<T>, PCollection<T>> unionAll(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET ALL semantics to compute the unionAll with providedPCollection<T>.static <T> PTransform<PCollectionList<T>, PCollection<T>> Returns a newPTransformtransform that follows SET DISTINCT semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the union of collections done in order for all collections inPCollectionList<T>.static <T> PTransform<PCollection<T>, PCollection<T>> unionDistinct(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET DISTINCT semantics to compute the union with providedPCollection<T>.
-
Constructor Details
-
Sets
public Sets()
-
-
Method Details
-
intersectDistinct
public static <T> PTransform<PCollection<T>,PCollection<T>> intersectDistinct(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET DISTINCT semantics to compute the intersection with providedPCollection<T>.The argument should not be modified after this is called.
The elements of the output
PCollectionwill all distinct elements that present in both pipeline is constructed and providedPCollection.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)). Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the inputPCollection<T>Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "2", "3", "3", "4", "5")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.intersectDistinct(right)); // results will be PCollection<String> containing: "1","3","4"- Type Parameters:
T- the type of the elements in the input and outputPCollection<T>s.
-
intersectDistinct
Returns aPTransformthat takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the intersection of collections done in order for all collections inPCollectionList<T>.Returns a new
PTransformtransform that follows SET DISTINCT semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the intersection of collections done in order for all collections inPCollectionList<T>.The elements of the output
PCollectionwill have all distinct elements that are present in both pipeline is constructed and nextPCollectionin the list and applied to all collections in order.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the firstPCollection<T>inPCollectionList<T>.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "2", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("3", "4", "4")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.intersectDistinct()); // results will be PCollection<String> containing: "3","4"- Type Parameters:
T- the type of the elements in the inputPCollectionList<T>and outputPCollection<T>s.
-
intersectAll
public static <T> PTransform<PCollection<T>,PCollection<T>> intersectAll(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET ALL semantics to compute the intersection with providedPCollection<T>.The argument should not be modified after this is called.
The elements of the output
PCollectionwhich will follow INTERSECT_ALL Semantics as follows: Given there are m elements on pipeline which is constructedPCollection(left) and n elements on in providedPCollection(right): - it will output MIN(m - n, 0) elements of left for all elements which are present in both left and right.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the inputPCollection<T>Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "1", "2", "3", "3", "4", "5")); PCollection<String> right = p.apply(Create.of("1", "1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.intersectAll(right)); // results will be PCollection<String> containing: "1","1","3","4"- Type Parameters:
T- the type of the elements in the input and outputPCollection<T>s.
-
intersectAll
Returns a newPTransformtransform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the intersection all of collections done in order for all collections inPCollectionList<T>.The elements of the output
PCollectionwhich will follow INTERSECT_ALL semantics. Output is calculated as follows: Given there are m elements on pipeline which is constructedPCollection(left) and n elements on in providedPCollection(right): - it will output MIN(m - n, 0) elements of left for all elements which are present in both left and right.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the firstPCollection<T>inPCollectionList<T>.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "1", "2", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("1", "5")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.intersectAll()); // results will be PCollection<String> containing: "1"- Type Parameters:
T- the type of the elements in the inputPCollectionList<T>and outputPCollection<T>s.
-
exceptDistinct
public static <T> PTransform<PCollection<T>,PCollection<T>> exceptDistinct(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET DISTINCT semantics to compute the difference (except) with providedPCollection<T>.The argument should not be modified after this is called.
The elements of the output
PCollectionwill all distinct elements that present in pipeline is constructed but not present in providedPCollection.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the inputPCollection<T>Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "1", "2", "3", "3","4", "5")); PCollection<String> right = p.apply(Create.of("1", "1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.exceptDistinct(right)); // results will be PCollection<String> containing: "2","5"- Type Parameters:
T- the type of the elements in the input and outputPCollection<T>s.
-
exceptDistinct
Returns aPTransformthat takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the difference (except) of collections done in order for all collections inPCollectionList<T>.Returns a new
PTransformtransform that follows SET DISTINCT semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the difference (except) of collections done in order for all collections inPCollectionList<T>.The elements of the output
PCollectionwill have all distinct elements that are present in pipeline is constructed but not present in nextPCollectionin the list and applied to all collections in order.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the firstPCollection<T>inPCollectionList<T>.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "1", "2", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("1", "2", "2")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.exceptDistinct()); // results will be PCollection<String> containing: "5"- Type Parameters:
T- the type of the elements in the inputPCollectionList<T>and outputPCollection<T>s.
-
exceptAll
public static <T> PTransform<PCollection<T>,PCollection<T>> exceptAll(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET ALL semantics to compute the difference all (exceptAll) with providedPCollection<T>.The argument should not be modified after this is called.
The elements of the output
PCollectionwhich will follow EXCEPT_ALL Semantics as follows: Given there are m elements on pipeline which is constructedPCollection(left) and n elements on in providedPCollection(right): - it will output m elements of left for all elements which are present in left but not in right. - it will output MAX(m - n, 0) elements of left for all elements which are present in both left and right.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the inputPCollection<T>Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "1", "2", "3", "3", "3", "4", "5")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> results = left.apply(SetFns.exceptAll(right)); // results will be PCollection<String> containing: "1","1","2","3","3","5"- Type Parameters:
T- the type of the elements in the input and outputPCollection<T>s.
-
exceptAll
Returns a newPTransformtransform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the difference all (exceptAll) of collections done in order for all collections inPCollectionList<T>.The elements of the output
PCollectionwhich will follow EXCEPT_ALL semantics. Output is calculated as follows: Given there are m elements on pipeline which is constructedPCollection(left) and n elements on in providedPCollection(right): - it will output m elements of left for all elements which are present in left but not in right. - it will output MAX(m - n, 0) elements of left for all elements which are present in both left and right.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the firstPCollection<T>inPCollectionList<T>.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "1", "2", "3", "3", "3", "4", "5")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4", "6")); PCollection<String> third = p.apply(Create.of("1", "5")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.exceptAll()); // results will be PCollection<String> containing: "1","2","3","3"- Type Parameters:
T- the type of the elements in the inputPCollectionList<T>and outputPCollection<T>s.
-
unionDistinct
public static <T> PTransform<PCollection<T>,PCollection<T>> unionDistinct(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET DISTINCT semantics to compute the union with providedPCollection<T>.The argument should not be modified after this is called.
The elements of the output
PCollectionwill all distinct elements that present in pipeline is constructed or present in providedPCollection.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the inputPCollection<T>Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "2")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4")); PCollection<String> results = left.apply(SetFns.unionDistinct(right)); // results will be PCollection<String> containing: "1","2","3","4"- Type Parameters:
T- the type of the elements in the input and outputPCollection<T>s.
-
unionDistinct
Returns a newPTransformtransform that follows SET DISTINCT semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the union of collections done in order for all collections inPCollectionList<T>.The elements of the output
PCollectionwill have all distinct elements that are present in pipeline is constructed or present in nextPCollectionin the list and applied to all collections in order.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the firstPCollection<T>inPCollectionList<T>.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "2")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4")); PCollection<String> third = p.apply(Create.of("1", "5")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.unionDistinct()); // results will be PCollection<String> containing: "1","2","3","4","5"- Type Parameters:
T- the type of the elements in the inputPCollectionList<T>and outputPCollection<T>s.
-
unionAll
public static <T> PTransform<PCollection<T>,PCollection<T>> unionAll(PCollection<T> rightCollection) Returns a newPTransformtransform that follows SET ALL semantics to compute the unionAll with providedPCollection<T>.The argument should not be modified after this is called.
The elements of the output
PCollectionwhich will follow UNION_ALL semantics as follows: Given there are m elements on pipeline which is constructedPCollection(left) and n elements on in providedPCollection(right): - it will output m elements of left and m elements of right.Note that this transform requires that the
Coderof the allPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the inputPCollection<T>Pipeline p = ...; PCollection<String> left = p.apply(Create.of("1", "1", "2")); PCollection<String> right = p.apply(Create.of("1", "3", "4", "4")); PCollection<String> results = left.apply(SetFns.unionAll(right)); // results will be PCollection<String> containing: "1","1","1","2","3","4","4"- Type Parameters:
T- the type of the elements in the input and outputPCollection<T>s.
-
unionAll
Returns a newPTransformtransform that follows SET ALL semantics which takes aPCollectionList<PCollection<T>>and returns aPCollection<T>containing the unionAll of collections done in order for all collections inPCollectionList<T>.The elements of the output
PCollectionwhich will follow UNION_ALL semantics. Output is calculated as follows: Given there are m elements on pipeline which is constructedPCollection(left) and n elements on in providedPCollection(right): - it will output m elements of left and m elements of right.Note that this transform requires that the
Coderof the all inputsPCollection<T>to be deterministic (seeCoder.verifyDeterministic()). If the collectionCoderis not deterministic, an exception is thrown at pipeline construction time.All inputs must have equal
WindowFns and compatible triggers (seeTrigger.isCompatible(Trigger)).Triggers with multiple firings may lead to nondeterministic results since the thisPTransformis only computed over each individual firing.By default, the output
PCollection<T>encodes its elements using the sameCoderas that of the firstPCollection<T>inPCollectionList<T>.Pipeline p = ...; PCollection<String> first = p.apply(Create.of("1", "1", "2")); PCollection<String> second = p.apply(Create.of("1", "3", "4", "4")); PCollection<String> third = p.apply(Create.of("1", "5")); // Following example will perform (first intersect second) intersect third. PCollection<String> results = PCollectionList.of(first).and(second).and(third) .apply(SetFns.unionAll()); // results will be PCollection<String> containing: "1","1","1","1","2","3","4","4","5"- Type Parameters:
T- the type of the elements in the inputPCollectionList<T>and outputPCollection<T>s.
-