Class Partition<T>
- Type Parameters:
T- the type of the elements of the input and outputPCollections
- All Implemented Interfaces:
Serializable,HasDisplayData
Partition takes a PCollection<T> and a PartitionFn, uses the
PartitionFn to split the elements of the input PCollection into N partitions,
and returns a PCollectionList<T> that bundles N PCollection<T>s
containing the split elements.
Example of use:
PCollection<Student> students = ...;
// Split students up into 10 partitions, by percentile:
PCollectionList<Student> studentsByPercentile =
students.apply(Partition.of(10, new PartitionFn<Student>() {
public int partitionFor(Student student, int numPartitions) {
return student.getPercentile() // 0..99
* numPartitions / 100;
}}))
for (int i = 0; i < 10; i++) {
PCollection<Student> partition = studentsByPercentile.get(i);
...
}
PCollection<Student> students = ...;
// Split students up into 2 partitions, by percentile based on sideView
PCollectionView<Integer> gradesView =
pipeline.apply("grades", Create.of(50)).apply(View.asSingleton());
PCollectionList<Integer> studentsByGrades =
pipeline.apply(studentsPercentage)
.apply(Partition.of(2, ((elem, numPartitions, ctx) -> {
Integer grades = ctx.sideInput(gradesView);
return elem < grades ? 0 : 1;
}),Requirements.requiresSideInputs(gradesView)));
PCollection<Student> below = studentsByPercentile.get(0); // all students who are below 50
PCollection<Student> above = studentsByPercentile.get(1); // all students who are 50 or above
...
}
By default, the Coder of each of the PCollections in the output
PCollectionList is the same as the Coder of the input PCollection.
Each output element has the same timestamp and is in the same windows as its corresponding
input element, and each output PCollection has the same WindowFn associated with it as the input.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceA function object that chooses an output partition for an element.static interfaceA function object that chooses an output partition for an element. -
Field Summary
Fields inherited from class org.apache.beam.sdk.transforms.PTransform
annotations, displayData, name, resourceHints -
Method Summary
Modifier and TypeMethodDescriptionexpand(PCollection<T> in) Override this method to specify how thisPTransformshould be expanded on the givenInputT.static <T> Partition<T> of(int numPartitions, Partition.PartitionFn<? super T> partitionFn) Returns a newPartitionPTransformthat divides its inputPCollectioninto the given number of partitions, using the given partitioning function.static <T> Partition<T> of(int numPartitions, Partition.PartitionWithSideInputsFn<? super T> partitionFn, Requirements requirements) Returns a newPartitionPTransformthat divides its inputPCollectioninto the given number of partitions, using the given partitioning function.voidpopulateDisplayData(DisplayData.Builder builder) Register display data for the given transform or component.Methods inherited from class org.apache.beam.sdk.transforms.PTransform
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setDisplayData, setResourceHints, toString, validate, validate
-
Method Details
-
of
public static <T> Partition<T> of(int numPartitions, Partition.PartitionWithSideInputsFn<? super T> partitionFn, Requirements requirements) Returns a newPartitionPTransformthat divides its inputPCollectioninto the given number of partitions, using the given partitioning function.- Parameters:
numPartitions- the number of partitions to divide the inputPCollectionintopartitionFn- the function to invoke on each element to choose its output partitionrequirements- theRequirementsneeded to run it.- Throws:
IllegalArgumentException- ifnumPartitions <= 0
-
of
Returns a newPartitionPTransformthat divides its inputPCollectioninto the given number of partitions, using the given partitioning function.- Parameters:
numPartitions- the number of partitions to divide the inputPCollectionintopartitionFn- the function to invoke on each element to choose its output partition- Throws:
IllegalArgumentException- ifnumPartitions <= 0
-
expand
Description copied from class:PTransformOverride this method to specify how thisPTransformshould be expanded on the givenInputT.NOTE: This method should not be called directly. Instead apply the
PTransformshould be applied to theInputTusing theapplymethod.Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
- Specified by:
expandin classPTransform<PCollection<T>,PCollectionList<T>>
-
populateDisplayData
Description copied from class:PTransformRegister display data for the given transform or component.populateDisplayData(DisplayData.Builder)is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData). Implementations may callsuper.populateDisplayData(builder)in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayDatain interfaceHasDisplayData- Overrides:
populateDisplayDatain classPTransform<PCollection<T>,PCollectionList<T>> - Parameters:
builder- The builder to populate with display data.- See Also:
-