T
- the type of the elements of the input and output PCollection
spublic class Partition<T> extends PTransform<PCollection<T>,PCollectionList<T>>
Partition
takes a PCollection<T>
and a PartitionFn
, uses the PartitionFn
to split the elements of the input PCollection
into N
partitions,
and returns a PCollectionList<T>
that bundles N
PCollection<T>
s
containing the split elements.
Example of use:
PCollection<Student> students = ...;
// Split students up into 10 partitions, by percentile:
PCollectionList<Student> studentsByPercentile =
students.apply(Partition.of(10, new PartitionFn<Student>() {
public int partitionFor(Student student, int numPartitions) {
return student.getPercentile() // 0..99
* numPartitions / 100;
}}))
for (int i = 0; i < 10; i++) {
PCollection<Student> partition = studentsByPercentile.get(i);
...
}
PCollection<Student> students = ...;
// Split students up into 2 partitions, by percentile based on sideView
PCollectionView<Integer> gradesView =
pipeline.apply("grades", Create.of(50)).apply(View.asSingleton());
PCollectionList<Integer> studentsByGrades =
pipeline.apply(studentsPercentage)
.apply(Partition.of(2, ((elem, numPartitions, ctx) -> {
Integer grades = ctx.sideInput(gradesView);
return elem < grades ? 0 : 1;
}),Requirements.requiresSideInputs(gradesView)));
PCollection<Student> below = studentsByPercentile.get(0); // all students who are below 50
PCollection<Student> above = studentsByPercentile.get(1); // all students who are 50 or above
...
}
By default, the Coder
of each of the PCollection
s in the output PCollectionList
is the same as the Coder
of the input PCollection
.
Each output element has the same timestamp and is in the same windows as its corresponding
input element, and each output PCollection
has the same WindowFn
associated with it as the input.
Modifier and Type | Class and Description |
---|---|
static interface |
Partition.PartitionFn<T>
A function object that chooses an output partition for an element.
|
static interface |
Partition.PartitionWithSideInputsFn<T>
A function object that chooses an output partition for an element.
|
annotations, displayData, name, resourceHints
Modifier and Type | Method and Description |
---|---|
PCollectionList<T> |
expand(PCollection<T> in)
Override this method to specify how this
PTransform should be expanded on the given
InputT . |
static <T> Partition<T> |
of(int numPartitions,
Partition.PartitionFn<? super T> partitionFn)
Returns a new
Partition PTransform that divides its input PCollection
into the given number of partitions, using the given partitioning function. |
static <T> Partition<T> |
of(int numPartitions,
Partition.PartitionWithSideInputsFn<? super T> partitionFn,
Requirements requirements)
Returns a new
Partition PTransform that divides its input PCollection
into the given number of partitions, using the given partitioning function. |
void |
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.
|
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
public static <T> Partition<T> of(int numPartitions, Partition.PartitionWithSideInputsFn<? super T> partitionFn, Requirements requirements)
Partition
PTransform
that divides its input PCollection
into the given number of partitions, using the given partitioning function.numPartitions
- the number of partitions to divide the input PCollection
intopartitionFn
- the function to invoke on each element to choose its output partitionrequirements
- the Requirements
needed to run it.java.lang.IllegalArgumentException
- if numPartitions <= 0
public static <T> Partition<T> of(int numPartitions, Partition.PartitionFn<? super T> partitionFn)
Partition
PTransform
that divides its input PCollection
into the given number of partitions, using the given partitioning function.numPartitions
- the number of partitions to divide the input PCollection
intopartitionFn
- the function to invoke on each element to choose its output partitionjava.lang.IllegalArgumentException
- if numPartitions <= 0
public PCollectionList<T> expand(PCollection<T> in)
PTransform
PTransform
should be expanded on the given
InputT
.
NOTE: This method should not be called directly. Instead apply the PTransform
should
be applied to the InputT
using the apply
method.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
expand
in class PTransform<PCollection<T>,PCollectionList<T>>
public void populateDisplayData(DisplayData.Builder builder)
PTransform
populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect
display data via DisplayData.from(HasDisplayData)
. Implementations may call super.populateDisplayData(builder)
in order to register display data in the current namespace,
but should otherwise use subcomponent.populateDisplayData(builder)
to use the namespace
of the subcomponent.
By default, does not register any display data. Implementors may override this method to provide their own display data.
populateDisplayData
in interface HasDisplayData
populateDisplayData
in class PTransform<PCollection<T>,PCollectionList<T>>
builder
- The builder to populate with display data.HasDisplayData