K
- the type of the keys in the input and output PCollection
spublic class CoGroupByKey<K> extends PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>
PTransform
that performs a CoGroupByKey
on a tuple of tables. A CoGroupByKey
groups results from all tables by like keys into CoGbkResult
s, from which
the results for any specific table can be accessed by the TupleTag
supplied with the initial table.
Example of performing a CoGroupByKey
followed by a ParDo
that consumes the
results:
PCollection<KV<K, V1>> pt1 = ...;
PCollection<KV<K, V2>> pt2 = ...;
final TupleTag<V1> t1 = new TupleTag<>();
final TupleTag<V2> t2 = new TupleTag<>();
PCollection<KV<K, CoGbkResult>> coGbkResultCollection =
KeyedPCollectionTuple.of(t1, pt1)
.and(t2, pt2)
.apply(CoGroupByKey.<K>create());
PCollection<T> finalResultCollection =
coGbkResultCollection.apply(ParDo.of(
new DoFn<KV<K, CoGbkResult>, T>() {
{@literal @}ProcessElement
public void processElement(ProcessContext c) {
KV<K, CoGbkResult> e = c.element();
Iterable<V1> pt1Vals = e.getValue().getAll(t1);
V2 pt2Val = e.getValue().getOnly(t2);
... Do Something ....
c.output(...some T...);
}
}));
name
Modifier and Type | Method and Description |
---|---|
static <K> CoGroupByKey<K> |
create()
Returns a
CoGroupByKey<K> PTransform . |
PCollection<KV<K,CoGbkResult>> |
expand(KeyedPCollectionTuple<K> input)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, populateDisplayData, toString, validate
public static <K> CoGroupByKey<K> create()
CoGroupByKey<K>
PTransform
.K
- the type of the keys in the input and output PCollection
spublic PCollection<KV<K,CoGbkResult>> expand(KeyedPCollectionTuple<K> input)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>