Pipeline

java.lang.Object
- org.apache.beam.sdk.Pipeline

Direct Known Subclasses:: TestPipeline

public class Pipeline
extends java.lang.Object

A Pipeline manages a directed acyclic graph of PTransforms, and the PCollections that the PTransforms consume and produce.

Each Pipeline is self-contained and isolated from any other Pipeline. The PValues that are inputs and outputs of each of a Pipeline's PTransforms are also owned by that Pipeline. A PValue owned by one Pipeline can be read only by PTransforms also owned by that Pipeline. Pipelines can safely be executed concurrently.

Here is a typical example of use:

 
 // Start by defining the options for the pipeline.
 PipelineOptions options = PipelineOptionsFactory.create();
 // Then create the pipeline. The runner is determined by the options.
 Pipeline p = Pipeline.create(options);

 // A root PTransform, like TextIO.Read or Create, gets added
 // to the Pipeline by being applied:
 PCollection<String> lines =
     p.apply(TextIO.read().from("gs://bucket/dir/file*.txt"));

 // A Pipeline can have multiple root transforms:
 PCollection<String> moreLines =
     p.apply(TextIO.read().from("gs://bucket/other/dir/file*.txt"));
 PCollection<String> yetMoreLines =
     p.apply(Create.of("yet", "more", "lines").withCoder(StringUtf8Coder.of()));

 // Further PTransforms can be applied, in an arbitrary (acyclic) graph.
 // Subsequent PTransforms (and intermediate PCollections etc.) are
 // implicitly part of the same Pipeline.
 PCollection<String> allLines =
     PCollectionList.of(lines).and(moreLines).and(yetMoreLines)
     .apply(new Flatten<String>());
 PCollection<KV<String, Integer>> wordCounts =
     allLines
     .apply(ParDo.of(new ExtractWords()))
     .apply(new Count<String>());
 PCollection<String> formattedWordCounts =
     wordCounts.apply(ParDo.of(new FormatCounts()));
 formattedWordCounts.apply(TextIO.write().to("gs://bucket/dir/counts.txt"));

 // PTransforms aren't executed when they're applied, rather they're
 // just added to the Pipeline.  Once the whole Pipeline of PTransforms
 // is constructed, the Pipeline's PTransforms can be run using a
 // PipelineRunner.  The default PipelineRunner executes the Pipeline
 // directly, sequentially, in this one process, which is useful for
 // unit tests and simple experiments:
 p.run();

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`Pipeline.PipelineExecutionException` Thrown during execution of a `Pipeline`, whenever user code within that `Pipeline` throws an exception.
`static interface`	`Pipeline.PipelineVisitor` *For internal use only; no backwards-compatibility guarantees.*

Constructor Summary

Constructors
Modifier Constructor and Description

protected Pipeline(PipelineOptions options)

Constructors
Modifier	Constructor and Description
`protected`	`Pipeline(PipelineOptions options)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`<OutputT extends POutput> OutputT`	`apply(PTransform<? super PBegin,OutputT> root)` Like `apply(String, PTransform)` but the transform node in the `Pipeline` graph will be named according to `PTransform.getName()`.
`<OutputT extends POutput> OutputT`	`apply(java.lang.String name, PTransform<? super PBegin,OutputT> root)` Adds a root `PTransform`, such as `Read` or `Create`, to this `Pipeline`.
`static <InputT extends PInput,OutputT extends POutput> OutputT`	`applyTransform(InputT input, PTransform<? super InputT,OutputT> transform)` *For internal use only; no backwards-compatibility guarantees.*
`static <InputT extends PInput,OutputT extends POutput> OutputT`	`applyTransform(java.lang.String name, InputT input, PTransform<? super InputT,OutputT> transform)` *For internal use only; no backwards-compatibility guarantees.*
`PBegin`	`begin()` Returns a `PBegin` owned by this Pipeline.
`static Pipeline`	`create()` Constructs a pipeline from default `PipelineOptions`.
`static Pipeline`	`create(PipelineOptions options)` Constructs a pipeline from the provided `PipelineOptions`.
`static Pipeline`	`forTransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy transforms, PipelineOptions options)`
`CoderRegistry`	`getCoderRegistry()` Returns the `CoderRegistry` that this `Pipeline` uses.
`void`	`replaceAll(java.util.List<org.apache.beam.sdk.runners.PTransformOverride> overrides)` *For internal use only; no backwards-compatibility guarantees.*
`PipelineResult`	`run()` Runs this `Pipeline` according to the `PipelineOptions` used to create the `Pipeline` via `create(PipelineOptions)`.
`PipelineResult`	`run(PipelineOptions options)` Runs this `Pipeline` using the given `PipelineOptions`, using the runner specified by the options.
`void`	`setCoderRegistry(CoderRegistry coderRegistry)` Deprecated. this should never be used - every `Pipeline` has a registry throughout its lifetime.
`java.lang.String`	`toString()`
`void`	`traverseTopologically(Pipeline.PipelineVisitor visitor)` *For internal use only; no backwards-compatibility guarantees.*

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - Pipeline
```
protected Pipeline(PipelineOptions options)
```
- Method Detail
  - create
```
public static Pipeline create()
```
    Constructs a pipeline from default PipelineOptions.
  - create
```
public static Pipeline create(PipelineOptions options)
```
    Constructs a pipeline from the provided PipelineOptions.
  - begin
```
public PBegin begin()
```
    Returns a PBegin owned by this Pipeline. This serves as the input of a root PTransform such as Read or Create.
  - apply
```
public <OutputT extends POutput> OutputT apply(PTransform<? super PBegin,OutputT> root)
```
    Like apply(String, PTransform) but the transform node in the Pipeline graph will be named according to PTransform.getName().
    
    See Also:
    
    apply(String, PTransform)
  - apply
```
public <OutputT extends POutput> OutputT apply(java.lang.String name,
                                               PTransform<? super PBegin,OutputT> root)
```
    Adds a root PTransform, such as Read or Create, to this Pipeline.
    The node in the Pipeline graph will use the provided name. This name is used in various places, including the monitoring UI, logging, and to stably identify this node in the Pipeline graph upon update.
    Alias for begin().apply(name, root).
  - forTransformHierarchy
```
@Internal
public static Pipeline forTransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy transforms,
                                                       PipelineOptions options)
```
  - replaceAll
```
@Internal
public void replaceAll(java.util.List<org.apache.beam.sdk.runners.PTransformOverride> overrides)
```
    For internal use only; no backwards-compatibility guarantees.
    Replaces all nodes that match a PTransformOverride in this pipeline. Overrides are applied in the order they are present within the list.
    After all nodes are replaced, ensures that no nodes in the updated graph match any of the overrides.
  - run
```
public PipelineResult run()
```
    Runs this Pipeline according to the PipelineOptions used to create the Pipeline via create(PipelineOptions).
  - run
```
public PipelineResult run(PipelineOptions options)
```
    Runs this Pipeline using the given PipelineOptions, using the runner specified by the options.
  - getCoderRegistry
```
public CoderRegistry getCoderRegistry()
```
    Returns the CoderRegistry that this Pipeline uses.
  - setCoderRegistry
```
@Deprecated
public void setCoderRegistry(CoderRegistry coderRegistry)
```
    Deprecated. this should never be used - every Pipeline has a registry throughout its lifetime.
  - traverseTopologically
```
@Internal
public void traverseTopologically(Pipeline.PipelineVisitor visitor)
```
    For internal use only; no backwards-compatibility guarantees.
    Invokes the PipelineVisitor's Pipeline.PipelineVisitor.visitPrimitiveTransform(org.apache.beam.sdk.runners.TransformHierarchy.Node) and Pipeline.PipelineVisitor.visitValue(org.apache.beam.sdk.values.PValue, org.apache.beam.sdk.runners.TransformHierarchy.Node) operations on each of this Pipeline's transform and value nodes, in forward topological order.
    Traversal of the Pipeline causes PTransforms and PValues owned by the Pipeline to be marked as finished, at which point they may no longer be modified.
    Typically invoked by PipelineRunner subclasses.
  - applyTransform
```
@Internal
public static <InputT extends PInput,OutputT extends POutput> OutputT applyTransform(InputT input,
                                                                                               PTransform<? super InputT,OutputT> transform)
```
    For internal use only; no backwards-compatibility guarantees.
    Like applyTransform(String, PInput, PTransform) but defaulting to the name provided by the PTransform.
  - applyTransform
```
@Internal
public static <InputT extends PInput,OutputT extends POutput> OutputT applyTransform(java.lang.String name,
                                                                                               InputT input,
                                                                                               PTransform<? super InputT,OutputT> transform)
```
    For internal use only; no backwards-compatibility guarantees.
    Applies the given PTransform to this input InputT and returns its OutputT. This uses name to identify this specific application of the transform. This name is used in various places, including the monitoring UI, logging, and to stably identify this application node in the Pipeline graph during update.
    Each PInput subclass that provides an apply method should delegate to this method to ensure proper registration with the PipelineRunner.
  - toString
```
public java.lang.String toString()
```
    Overrides:
    
    toString in class java.lang.Object

Class Pipeline

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Pipeline

Method Detail

create

create

begin

apply

apply

forTransformHierarchy

replaceAll

run

run

getCoderRegistry

setCoderRegistry

traverseTopologically

applyTransform

applyTransform

toString