apache_beam.pvalue module¶
PValue, PCollection: one node of a dataflow graph.
A node of a dataflow processing graph is a PValue. Currently, there is only one type: PCollection (a potentially very large set of arbitrary values). Once created, a PValue belongs to a pipeline and has an associated transform (of type PTransform), which describes how the value will be produced when the pipeline gets executed.
-
class
apache_beam.pvalue.
PCollection
(pipeline, tag=None, element_type=None, windowing=None)[source]¶ Bases:
apache_beam.pvalue.PValue
A multiple values (potentially huge) container.
Dataflow users should not construct PCollection objects directly in their pipelines.
Initializes a PValue with all arguments hidden behind keyword arguments.
Parameters: - pipeline – Pipeline object for this PValue.
- tag – Tag of this PValue.
- element_type – The type of this PValue.
-
windowing
¶
-
class
apache_beam.pvalue.
TaggedOutput
(tag, value)[source]¶ Bases:
object
An object representing a tagged value.
ParDo, Map, and FlatMap transforms can emit values on multiple outputs which are distinguished by string tags. The DoFn will return plain values if it wants to emit on the main output and TaggedOutput objects if it wants to emit a value on a specific tagged output.
-
class
apache_beam.pvalue.
AsSingleton
(pcoll, default_value=<object object>)[source]¶ Bases:
apache_beam.pvalue.AsSideInput
Marker specifying that an entire PCollection is to be used as a side input.
When a PCollection is supplied as a side input to a PTransform, it is necessary to indicate whether the entire PCollection should be made available as a PTransform side argument (in the form of an iterable), or whether just one value should be pulled from the PCollection and supplied as the side argument (as an ordinary value).
Wrapping a PCollection side input argument to a PTransform in this container (e.g., data.apply(‘label’, MyPTransform(), AsSingleton(my_side_input) ) selects the latter behavor.
The input PCollection must contain exactly one value per window, unless a default is given, in which case it may be empty.
-
element_type
¶
-
-
class
apache_beam.pvalue.
AsIter
(pcoll)[source]¶ Bases:
apache_beam.pvalue.AsSideInput
Marker specifying that an entire PCollection is to be used as a side input.
When a PCollection is supplied as a side input to a PTransform, it is necessary to indicate whether the entire PCollection should be made available as a PTransform side argument (in the form of an iterable), or whether just one value should be pulled from the PCollection and supplied as the side argument (as an ordinary value).
Wrapping a PCollection side input argument to a PTransform in this container (e.g., data.apply(‘label’, MyPTransform(), AsIter(my_side_input) ) selects the former behavor.
-
element_type
¶
-
-
class
apache_beam.pvalue.
AsList
(pcoll)[source]¶ Bases:
apache_beam.pvalue.AsSideInput
Marker specifying that an entire PCollection is to be used as a side input.
Intended for use in side-argument specification—the same places where AsSingleton and AsIter are used, but forces materialization of this PCollection as a list.
Parameters: pcoll – Input pcollection. Returns: An AsList-wrapper around a PCollection whose one element is a list containing all elements in pcoll.
-
class
apache_beam.pvalue.
AsDict
(pcoll)[source]¶ Bases:
apache_beam.pvalue.AsSideInput
Marker specifying a PCollection to be used as an indexable side input.
Intended for use in side-argument specification—the same places where AsSingleton and AsIter are used, but returns an interface that allows key lookup.
Parameters: pcoll – Input pcollection. All elements should be key-value pairs (i.e. 2-tuples) with unique keys. Returns: - An AsDict-wrapper around a PCollection whose one element is a dict with
- entries for uniquely-keyed pairs in pcoll.
-
class
apache_beam.pvalue.
EmptySideInput
[source]¶ Bases:
object
Value indicating when a singleton side input was empty.
If a PCollection was furnished as a singleton side input to a PTransform, and that PCollection was empty, then this value is supplied to the DoFn in the place where a value from a non-empty PCollection would have gone. This alerts the DoFn that the side input PCollection was empty. Users may want to check whether side input values are EmptySideInput, but they will very likely never want to create new instances of this class themselves.