public class Pipeline
extends java.lang.Object
Pipeline
manages a directed acyclic graph of PTransforms
, and the
PCollections
that the PTransforms
consume and produce.
Each Pipeline
is self-contained and isolated from any other
Pipeline
. The PValues
that are inputs and outputs of each of a
Pipeline's
PTransforms
are also owned by that
Pipeline
. A PValue
owned by one Pipeline
can be read only by
PTransforms
also owned by that Pipeline
. Pipelines
can safely be executed concurrently.
Here is a typical example of use:
// Start by defining the options for the pipeline.
PipelineOptions options = PipelineOptionsFactory.create();
// Then create the pipeline. The runner is determined by the options.
Pipeline p = Pipeline.create(options);
// A root PTransform, like TextIO.Read or Create, gets added
// to the Pipeline by being applied:
PCollection<String> lines =
p.apply(TextIO.read().from("gs://bucket/dir/file*.txt"));
// A Pipeline can have multiple root transforms:
PCollection<String> moreLines =
p.apply(TextIO.read().from("gs://bucket/other/dir/file*.txt"));
PCollection<String> yetMoreLines =
p.apply(Create.of("yet", "more", "lines").withCoder(StringUtf8Coder.of()));
// Further PTransforms can be applied, in an arbitrary (acyclic) graph.
// Subsequent PTransforms (and intermediate PCollections etc.) are
// implicitly part of the same Pipeline.
PCollection<String> allLines =
PCollectionList.of(lines).and(moreLines).and(yetMoreLines)
.apply(new Flatten<String>());
PCollection<KV<String, Integer>> wordCounts =
allLines
.apply(ParDo.of(new ExtractWords()))
.apply(new Count<String>());
PCollection<String> formattedWordCounts =
wordCounts.apply(ParDo.of(new FormatCounts()));
formattedWordCounts.apply(TextIO.write().to("gs://bucket/dir/counts.txt"));
// PTransforms aren't executed when they're applied, rather they're
// just added to the Pipeline. Once the whole Pipeline of PTransforms
// is constructed, the Pipeline's PTransforms can be run using a
// PipelineRunner. The default PipelineRunner executes the Pipeline
// directly, sequentially, in this one process, which is useful for
// unit tests and simple experiments:
p.run();
Modifier and Type | Class and Description |
---|---|
static class |
Pipeline.PipelineExecutionException
|
static interface |
Pipeline.PipelineVisitor
For internal use only; no backwards-compatibility guarantees.
|
Modifier | Constructor and Description |
---|---|
protected |
Pipeline(PipelineOptions options) |
Modifier and Type | Method and Description |
---|---|
<OutputT extends POutput> |
apply(PTransform<? super PBegin,OutputT> root)
Like
apply(String, PTransform) but the transform node in the Pipeline
graph will be named according to PTransform.getName() . |
<OutputT extends POutput> |
apply(java.lang.String name,
PTransform<? super PBegin,OutputT> root)
|
static <InputT extends PInput,OutputT extends POutput> |
applyTransform(InputT input,
PTransform<? super InputT,OutputT> transform)
For internal use only; no backwards-compatibility guarantees.
|
static <InputT extends PInput,OutputT extends POutput> |
applyTransform(java.lang.String name,
InputT input,
PTransform<? super InputT,OutputT> transform)
For internal use only; no backwards-compatibility guarantees.
|
PBegin |
begin()
Returns a
PBegin owned by this Pipeline. |
static Pipeline |
create()
Constructs a pipeline from default
PipelineOptions . |
static Pipeline |
create(PipelineOptions options)
Constructs a pipeline from the provided
PipelineOptions . |
static Pipeline |
forTransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy transforms,
PipelineOptions options) |
CoderRegistry |
getCoderRegistry()
Returns the
CoderRegistry that this Pipeline uses. |
void |
replaceAll(java.util.List<org.apache.beam.sdk.runners.PTransformOverride> overrides)
For internal use only; no backwards-compatibility guarantees.
|
PipelineResult |
run()
Runs this
Pipeline according to the PipelineOptions used to create the Pipeline via create(PipelineOptions) . |
PipelineResult |
run(PipelineOptions options)
Runs this
Pipeline using the given PipelineOptions , using the runner specified
by the options. |
void |
setCoderRegistry(CoderRegistry coderRegistry)
Deprecated.
this should never be used - every
Pipeline has a registry throughout its
lifetime. |
java.lang.String |
toString() |
void |
traverseTopologically(Pipeline.PipelineVisitor visitor)
For internal use only; no backwards-compatibility guarantees.
|
protected Pipeline(PipelineOptions options)
public static Pipeline create()
PipelineOptions
.public static Pipeline create(PipelineOptions options)
PipelineOptions
.public PBegin begin()
PBegin
owned by this Pipeline. This serves as the input of a root PTransform
such as Read
or Create
.public <OutputT extends POutput> OutputT apply(PTransform<? super PBegin,OutputT> root)
apply(String, PTransform)
but the transform node in the Pipeline
graph will be named according to PTransform.getName()
.apply(String, PTransform)
public <OutputT extends POutput> OutputT apply(java.lang.String name, PTransform<? super PBegin,OutputT> root)
PTransform
, such as Read
or Create
, to this Pipeline
.
The node in the Pipeline
graph will use the provided name
. This name is used
in various places, including the monitoring UI, logging, and to stably identify this node in
the Pipeline
graph upon update.
Alias for begin().apply(name, root)
.
@Internal public static Pipeline forTransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy transforms, PipelineOptions options)
@Internal public void replaceAll(java.util.List<org.apache.beam.sdk.runners.PTransformOverride> overrides)
Replaces all nodes that match a PTransformOverride
in this pipeline. Overrides are
applied in the order they are present within the list.
After all nodes are replaced, ensures that no nodes in the updated graph match any of the overrides.
public PipelineResult run()
Pipeline
according to the PipelineOptions
used to create the Pipeline
via create(PipelineOptions)
.public PipelineResult run(PipelineOptions options)
Pipeline
using the given PipelineOptions
, using the runner specified
by the options.public CoderRegistry getCoderRegistry()
CoderRegistry
that this Pipeline
uses.@Deprecated public void setCoderRegistry(CoderRegistry coderRegistry)
Pipeline
has a registry throughout its
lifetime.@Internal public void traverseTopologically(Pipeline.PipelineVisitor visitor)
Invokes the PipelineVisitor's
Pipeline.PipelineVisitor.visitPrimitiveTransform(org.apache.beam.sdk.runners.TransformHierarchy.Node)
and
Pipeline.PipelineVisitor.visitValue(org.apache.beam.sdk.values.PValue, org.apache.beam.sdk.runners.TransformHierarchy.Node)
operations on each of this
Pipeline's
transform and value nodes, in forward
topological order.
Traversal of the Pipeline
causes PTransforms
and
PValues
owned by the Pipeline
to be marked as finished,
at which point they may no longer be modified.
Typically invoked by PipelineRunner
subclasses.
@Internal public static <InputT extends PInput,OutputT extends POutput> OutputT applyTransform(InputT input, PTransform<? super InputT,OutputT> transform)
Like applyTransform(String, PInput, PTransform)
but defaulting to the name
provided by the PTransform
.
@Internal public static <InputT extends PInput,OutputT extends POutput> OutputT applyTransform(java.lang.String name, InputT input, PTransform<? super InputT,OutputT> transform)
Applies the given PTransform
to this input InputT
and returns
its OutputT
. This uses name
to identify this specific application
of the transform. This name is used in various places, including the monitoring UI,
logging, and to stably identify this application node in the Pipeline
graph during
update.
Each PInput
subclass that provides an apply
method should delegate to
this method to ensure proper registration with the PipelineRunner
.
public java.lang.String toString()
toString
in class java.lang.Object