K - the type of the keys in the input and output
PCollectionspublic class CoGroupByKey<K> extends PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>
PTransform that performs a CoGroupByKey on a tuple
of tables. A CoGroupByKey groups results from all
tables by like keys into CoGbkResults,
from which the results for any specific table can be accessed by the
TupleTag
supplied with the initial table.
Example of performing a CoGroupByKey followed by a
ParDo that consumes
the results:
PCollection<KV<K, V1>> pt1 = ...;
PCollection<KV<K, V2>> pt2 = ...;
final TupleTag<V1> t1 = new TupleTag<>();
final TupleTag<V2> t2 = new TupleTag<>();
PCollection<KV<K, CoGbkResult>> coGbkResultCollection =
KeyedPCollectionTuple.of(t1, pt1)
.and(t2, pt2)
.apply(CoGroupByKey.<K>create());
PCollection<T> finalResultCollection =
coGbkResultCollection.apply(ParDo.of(
new DoFn<KV<K, CoGbkResult>, T>() {
{@literal @}ProcessElement
public void processElement(ProcessContext c) {
KV<K, CoGbkResult> e = c.element();
Iterable<V1> pt1Vals = e.getValue().getAll(t1);
V2 pt2Val = e.getValue().getOnly(t2);
... Do Something ....
c.output(...some T...);
}
}));
name| Modifier and Type | Method and Description |
|---|---|
static <K> CoGroupByKey<K> |
create()
Returns a
CoGroupByKey<K> PTransform. |
PCollection<KV<K,CoGbkResult>> |
expand(KeyedPCollectionTuple<K> input)
Override this method to specify how this
PTransform should be expanded
on the given InputT. |
getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validatepublic static <K> CoGroupByKey<K> create()
CoGroupByKey<K> PTransform.K - the type of the keys in the input and output
PCollectionspublic PCollection<KV<K,CoGbkResult>> expand(KeyedPCollectionTuple<K> input)
PTransformPTransform should be expanded
on the given InputT.
NOTE: This method should not be called directly. Instead apply the
PTransform should be applied to the InputT using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand in class PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>