Class PCollection<T>
- Type Parameters:
T- the type of the elements of thisPCollection
PCollection<T> is an immutable collection of values of type
T. A PCollection can contain either a bounded or unbounded number of elements. Bounded
and unbounded PCollections are produced as the output of PTransforms (including root PTransforms like Read and Create), and can be passed
as the inputs of other PTransforms.
Some root transforms produce bounded PCollections and others produce unbounded ones.
For example, GenerateSequence.from(long) with GenerateSequence.to(long) produces a fixed set
of integers, so it produces a bounded PCollection. GenerateSequence.from(long) without
a GenerateSequence.to(long) produces all integers as an infinite stream, so it produces an
unbounded PCollection.
Each element in a PCollection has an associated timestamp. Readers assign timestamps
to elements when they create PCollections, and other PTransforms propagate these timestamps from their input to their output. See the documentation
on BoundedSource.BoundedReader and UnboundedSource.UnboundedReader for more information on how these readers
produce timestamps and watermarks.
Additionally, a PCollection has an associated WindowFn and each element is
assigned to a set of windows. By default, the windowing function is GlobalWindows and all
elements are assigned into a single default window. This default can be overridden with the
Window PTransform.
See the individual PTransform subclasses for specific information on how they
propagate timestamps and windowing.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumThe enumeration of cases for whether aPCollectionis bounded. -
Method Summary
Modifier and TypeMethodDescription<OutputT extends POutput>
OutputTapply(String name, PTransform<? super PCollection<T>, OutputT> t) Applies the givenPTransformto this inputPCollection, usingnameto identify this specific application of the transform.<OutputT extends POutput>
OutputTapply(PTransform<? super PCollection<T>, OutputT> t) of thePTransform.static <T> PCollection<T> createPrimitiveOutputInternal(Pipeline pipeline, WindowingStrategy<?, ?> windowingStrategy, PCollection.IsBounded isBounded, @Nullable Coder<T> coder) For internal use only; no backwards-compatibility guarantees.static <T> PCollection<T> createPrimitiveOutputInternal(Pipeline pipeline, WindowingStrategy<?, ?> windowingStrategy, PCollection.IsBounded isBounded, @Nullable Coder<T> coder, TupleTag<?> tag) For internal use only; no backwards-compatibility guarantees.expand()voidfinishSpecifying(PInput input, PTransform<?, ?> transform) After building, finalizes thisPValueto make it ready for running.voidfinishSpecifyingOutput(String transformName, PInput input, PTransform<?, ?> transform) As part of applying the producingPTransform, finalizes this output to make it ready for being used as an input and for running.getCoder()Returns theCoderused by thisPCollectionto encode and decode the values stored in it.Returns the attached schema's fromRowFunction.getName()Returns the name of thisPCollection.Returns the attached schema.Returns the attached schema's toRowFunction.Returns aTypeDescriptor<T>with some reflective information aboutT, if possible.WindowingStrategy<?, ?> Returns theWindowingStrategyof thisPCollection.booleanReturns whether thisPCollectionhas an attached schema.Sets theCoderused by thisPCollectionto encode and decode the values stored in it.setIsBoundedInternal(PCollection.IsBounded isBounded) For internal use only; no backwards-compatibility guarantees.Sets the name of thisPCollection.setRowSchema(Schema schema) Sets a schema on this PCollection.setSchema(Schema schema, TypeDescriptor<T> typeDescriptor, SerializableFunction<T, Row> toRowFunction, SerializableFunction<Row, T> fromRowFunction) Sets aSchemaon thisPCollection.setTypeDescriptor(TypeDescriptor<T> typeDescriptor) Sets theTypeDescriptor<T>for thisPCollection<T>.setWindowingStrategyInternal(WindowingStrategy<?, ?> windowingStrategy) For internal use only; no backwards-compatibility guarantees.Methods inherited from class org.apache.beam.sdk.values.PValueBase
getKindString, getPipeline, toStringMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.beam.sdk.values.PInput
getPipelineMethods inherited from interface org.apache.beam.sdk.values.POutput
getPipeline
-
Method Details
-
finishSpecifyingOutput
Description copied from interface:POutputAs part of applying the producingPTransform, finalizes this output to make it ready for being used as an input and for running.This includes ensuring that all
PCollectionshaveCodersspecified or defaulted.Automatically invoked whenever this
POutputis output, afterPOutput.finishSpecifyingOutput(String, PInput, PTransform)has been called on each componentPValuereturned byPOutput.expand().- Specified by:
finishSpecifyingOutputin interfacePOutput- Overrides:
finishSpecifyingOutputin classPValueBase
-
finishSpecifying
After building, finalizes thisPValueto make it ready for running. Automatically invoked whenever thePValueis "used" (e.g., when apply() is called on it) and when the Pipeline is run (useful if this is aPValuewith no consumers).- Specified by:
finishSpecifyingin interfacePValue- Overrides:
finishSpecifyingin classPValueBase- Parameters:
input- thePInputthePTransformwas applied to to produce this outputtransform- thePTransformthat produced thisPValue
-
getTypeDescriptor
Returns aTypeDescriptor<T>with some reflective information aboutT, if possible. May returnnullif no information is available. Subclasses may override this to enable betterCoderinference. -
getName
Returns the name of thisPCollection.By default, the name of a
PCollectionis based on the name of thePTransformthat produces it. It can be specified explicitly by callingsetName(java.lang.String).- Specified by:
getNamein interfacePValue- Overrides:
getNamein classPValueBase- Throws:
IllegalStateException- if the name hasn't been set yet
-
expand
Description copied from interface:PValueExpands thisPOutputinto a list of its component outputPValues.- A
PValueexpands to itself. - A tuple or list of
PValues(such asPCollectionTupleorPCollectionList) expands to its componentPValue PValues.
Not intended to be invoked directly by user code..
- A
-
setName
Sets the name of thisPCollection. Returnsthis.- Overrides:
setNamein classPValueBase- Throws:
IllegalStateException- if thisPCollectionhas already been finalized and may no longer be set. Onceapply(org.apache.beam.sdk.transforms.PTransform<? super org.apache.beam.sdk.values.PCollection<T>, OutputT>)has been called, this will be the case.
-
getCoder
Returns theCoderused by thisPCollectionto encode and decode the values stored in it.- Throws:
IllegalStateException- if theCoderhasn't been set, and couldn't be inferred.
-
setCoder
- Throws:
IllegalStateException- if thisPCollectionhas already been finalized and may no longer be set. Onceapply(org.apache.beam.sdk.transforms.PTransform<? super org.apache.beam.sdk.values.PCollection<T>, OutputT>)has been called, this will be the case.
-
setRowSchema
Sets a schema on this PCollection.Can only be called on a
PCollection<Row>. -
setSchema
public PCollection<T> setSchema(Schema schema, TypeDescriptor<T> typeDescriptor, SerializableFunction<T, Row> toRowFunction, SerializableFunction<Row, T> fromRowFunction) Sets aSchemaon thisPCollection. -
hasSchema
public boolean hasSchema()Returns whether thisPCollectionhas an attached schema. -
getSchema
Returns the attached schema. -
getToRowFunction
Returns the attached schema's toRowFunction. -
getFromRowFunction
Returns the attached schema's fromRowFunction. -
apply
of thePTransform.- Returns:
- the output of the applied
PTransform
-
apply
public <OutputT extends POutput> OutputT apply(String name, PTransform<? super PCollection<T>, OutputT> t) Applies the givenPTransformto this inputPCollection, usingnameto identify this specific application of the transform. This name is used in various places, including the monitoring UI, logging, and to stably identify this application node in the job graph.- Returns:
- the output of the applied
PTransform
-
getWindowingStrategy
Returns theWindowingStrategyof thisPCollection. -
isBounded
-
setTypeDescriptor
Sets theTypeDescriptor<T>for thisPCollection<T>. This may allow the enclosingPCollectionTuple,PCollectionList, orPTransform<?, PCollection<T>>, etc., to provide more detailed reflective information. -
setWindowingStrategyInternal
@Internal public PCollection<T> setWindowingStrategyInternal(WindowingStrategy<?, ?> windowingStrategy) For internal use only; no backwards-compatibility guarantees. -
setIsBoundedInternal
For internal use only; no backwards-compatibility guarantees. -
createPrimitiveOutputInternal
@Internal public static <T> PCollection<T> createPrimitiveOutputInternal(Pipeline pipeline, WindowingStrategy<?, ?> windowingStrategy, PCollection.IsBounded isBounded, @Nullable Coder<T> coder) For internal use only; no backwards-compatibility guarantees. -
createPrimitiveOutputInternal
@Internal public static <T> PCollection<T> createPrimitiveOutputInternal(Pipeline pipeline, WindowingStrategy<?, ?> windowingStrategy, PCollection.IsBounded isBounded, @Nullable Coder<T> coder, TupleTag<?> tag) For internal use only; no backwards-compatibility guarantees.
-