public class Pipeline
extends java.lang.Object
Pipeline manages a directed acyclic graph of PTransforms, and the
 PCollections that the PTransforms consume and produce.
 Each Pipeline is self-contained and isolated from any other
 Pipeline. The PValues that are inputs and outputs of each of a
 Pipeline's PTransforms are also owned by that
 Pipeline. A PValue owned by one Pipeline can be read only by
 PTransforms also owned by that Pipeline. Pipelines
 can safely be executed concurrently.
 
Here is a typical example of use:
 
 // Start by defining the options for the pipeline.
 PipelineOptions options = PipelineOptionsFactory.create();
 // Then create the pipeline. The runner is determined by the options.
 Pipeline p = Pipeline.create(options);
 // A root PTransform, like TextIO.Read or Create, gets added
 // to the Pipeline by being applied:
 PCollection<String> lines =
     p.apply(TextIO.read().from("gs://bucket/dir/file*.txt"));
 // A Pipeline can have multiple root transforms:
 PCollection<String> moreLines =
     p.apply(TextIO.read().from("gs://bucket/other/dir/file*.txt"));
 PCollection<String> yetMoreLines =
     p.apply(Create.of("yet", "more", "lines").withCoder(StringUtf8Coder.of()));
 // Further PTransforms can be applied, in an arbitrary (acyclic) graph.
 // Subsequent PTransforms (and intermediate PCollections etc.) are
 // implicitly part of the same Pipeline.
 PCollection<String> allLines =
     PCollectionList.of(lines).and(moreLines).and(yetMoreLines)
     .apply(new Flatten<String>());
 PCollection<KV<String, Integer>> wordCounts =
     allLines
     .apply(ParDo.of(new ExtractWords()))
     .apply(new Count<String>());
 PCollection<String> formattedWordCounts =
     wordCounts.apply(ParDo.of(new FormatCounts()));
 formattedWordCounts.apply(TextIO.write().to("gs://bucket/dir/counts.txt"));
 // PTransforms aren't executed when they're applied, rather they're
 // just added to the Pipeline.  Once the whole Pipeline of PTransforms
 // is constructed, the Pipeline's PTransforms can be run using a
 // PipelineRunner.  The default PipelineRunner executes the Pipeline
 // directly, sequentially, in this one process, which is useful for
 // unit tests and simple experiments:
 p.run();
  | Modifier and Type | Class and Description | 
|---|---|
| static class  | Pipeline.PipelineExecutionException | 
| static interface  | Pipeline.PipelineVisitorFor internal use only; no backwards-compatibility guarantees. | 
| Modifier | Constructor and Description | 
|---|---|
| protected  | Pipeline(PipelineOptions options) | 
| Modifier and Type | Method and Description | 
|---|---|
| <OutputT extends POutput> | apply(PTransform<? super PBegin,OutputT> root)Like  apply(String, PTransform)but the transform node in thePipelinegraph will be named according toPTransform.getName(). | 
| <OutputT extends POutput> | apply(java.lang.String name,
     PTransform<? super PBegin,OutputT> root) | 
| static <InputT extends PInput,OutputT extends POutput> | applyTransform(InputT input,
              PTransform<? super InputT,OutputT> transform)For internal use only; no backwards-compatibility guarantees. | 
| static <InputT extends PInput,OutputT extends POutput> | applyTransform(java.lang.String name,
              InputT input,
              PTransform<? super InputT,OutputT> transform)For internal use only; no backwards-compatibility guarantees. | 
| PBegin | begin()Returns a  PBeginowned by this Pipeline. | 
| static Pipeline | create()Constructs a pipeline from default  PipelineOptions. | 
| static Pipeline | create(PipelineOptions options)Constructs a pipeline from the provided  PipelineOptions. | 
| static Pipeline | forTransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy transforms,
                     PipelineOptions options) | 
| CoderRegistry | getCoderRegistry()Returns the  CoderRegistrythat thisPipelineuses. | 
| void | replaceAll(java.util.List<org.apache.beam.sdk.runners.PTransformOverride> overrides)For internal use only; no backwards-compatibility guarantees. | 
| PipelineResult | run()Runs this  Pipelineaccording to thePipelineOptionsused to create thePipelineviacreate(PipelineOptions). | 
| PipelineResult | run(PipelineOptions options)Runs this  Pipelineusing the givenPipelineOptions, using the runner specified
 by the options. | 
| void | setCoderRegistry(CoderRegistry coderRegistry)Deprecated. 
 this should never be used - every  Pipelinehas a registry throughout its
     lifetime. | 
| java.lang.String | toString() | 
| void | traverseTopologically(Pipeline.PipelineVisitor visitor)For internal use only; no backwards-compatibility guarantees. | 
protected Pipeline(PipelineOptions options)
public static Pipeline create()
PipelineOptions.public static Pipeline create(PipelineOptions options)
PipelineOptions.public PBegin begin()
PBegin owned by this Pipeline. This serves as the input of a root PTransform such as Read or Create.public <OutputT extends POutput> OutputT apply(PTransform<? super PBegin,OutputT> root)
apply(String, PTransform) but the transform node in the Pipeline
 graph will be named according to PTransform.getName().apply(String, PTransform)public <OutputT extends POutput> OutputT apply(java.lang.String name, PTransform<? super PBegin,OutputT> root)
PTransform, such as Read or Create, to this Pipeline.
 The node in the Pipeline graph will use the provided name. This name is used
 in various places, including the monitoring UI, logging, and to stably identify this node in
 the Pipeline graph upon update.
 
Alias for begin().apply(name, root).
@Internal public static Pipeline forTransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy transforms, PipelineOptions options)
@Internal public void replaceAll(java.util.List<org.apache.beam.sdk.runners.PTransformOverride> overrides)
Replaces all nodes that match a PTransformOverride in this pipeline. Overrides are
 applied in the order they are present within the list.
 
After all nodes are replaced, ensures that no nodes in the updated graph match any of the overrides.
public PipelineResult run()
Pipeline according to the PipelineOptions used to create the Pipeline via create(PipelineOptions).public PipelineResult run(PipelineOptions options)
Pipeline using the given PipelineOptions, using the runner specified
 by the options.public CoderRegistry getCoderRegistry()
CoderRegistry that this Pipeline uses.@Deprecated public void setCoderRegistry(CoderRegistry coderRegistry)
Pipeline has a registry throughout its
     lifetime.@Internal public void traverseTopologically(Pipeline.PipelineVisitor visitor)
Invokes the PipelineVisitor's
 Pipeline.PipelineVisitor.visitPrimitiveTransform(org.apache.beam.sdk.runners.TransformHierarchy.Node) and
 Pipeline.PipelineVisitor.visitValue(org.apache.beam.sdk.values.PValue, org.apache.beam.sdk.runners.TransformHierarchy.Node) operations on each of this
 Pipeline's transform and value nodes, in forward
 topological order.
 
Traversal of the Pipeline causes PTransforms and
 PValues owned by the Pipeline to be marked as finished,
 at which point they may no longer be modified.
 
Typically invoked by PipelineRunner subclasses.
@Internal public static <InputT extends PInput,OutputT extends POutput> OutputT applyTransform(InputT input, PTransform<? super InputT,OutputT> transform)
Like applyTransform(String, PInput, PTransform) but defaulting to the name
 provided by the PTransform.
@Internal public static <InputT extends PInput,OutputT extends POutput> OutputT applyTransform(java.lang.String name, InputT input, PTransform<? super InputT,OutputT> transform)
Applies the given PTransform to this input InputT and returns
 its OutputT. This uses name to identify this specific application
 of the transform. This name is used in various places, including the monitoring UI,
 logging, and to stably identify this application node in the Pipeline graph during
 update.
 
Each PInput subclass that provides an apply method should delegate to
 this method to ensure proper registration with the PipelineRunner.
public java.lang.String toString()
toString in class java.lang.Object