org.apache.beam.sdk.Pipeline

Direct Known Subclasses:: TestPipeline

public class Pipeline extends Object

A Pipeline manages a directed acyclic graph of PTransforms, and the PCollections that the PTransforms consume and produce.

Each Pipeline is self-contained and isolated from any other Pipeline. The PValues that are inputs and outputs of each of a Pipeline's PTransforms are also owned by that Pipeline. A PValue owned by one Pipeline can be read only by PTransforms also owned by that Pipeline. Pipelines can safely be executed concurrently.

Here is a typical example of use:


 // Start by defining the options for the pipeline.
 PipelineOptions options = PipelineOptionsFactory.create();
 // Then create the pipeline. The runner is determined by the options.
 Pipeline p = Pipeline.create(options);

 // A root PTransform, like TextIO.Read or Create, gets added
 // to the Pipeline by being applied:
 PCollection<String> lines =
     p.apply(TextIO.read().from("gs://bucket/dir/file*.txt"));

 // A Pipeline can have multiple root transforms:
 PCollection<String> moreLines =
     p.apply(TextIO.read().from("gs://bucket/other/dir/file*.txt"));
 PCollection<String> yetMoreLines =
     p.apply(Create.of("yet", "more", "lines").withCoder(StringUtf8Coder.of()));

 // Further PTransforms can be applied, in an arbitrary (acyclic) graph.
 // Subsequent PTransforms (and intermediate PCollections etc.) are
 // implicitly part of the same Pipeline.
 PCollection<String> allLines =
     PCollectionList.of(lines).and(moreLines).and(yetMoreLines)
     .apply(new Flatten<String>());
 PCollection<KV<String, Integer>> wordCounts =
     allLines
     .apply(ParDo.of(new ExtractWords()))
     .apply(new Count<String>());
 PCollection<String> formattedWordCounts =
     wordCounts.apply(ParDo.of(new FormatCounts()));
 formattedWordCounts.apply(TextIO.write().to("gs://bucket/dir/counts.txt"));

 // PTransforms aren't executed when they're applied, rather they're
 // just added to the Pipeline.  Once the whole Pipeline of PTransforms
 // is constructed, the Pipeline's PTransforms can be run using a
 // PipelineRunner.  The default PipelineRunner executes the Pipeline
 // directly, sequentially, in this one process, which is useful for
 // unit tests and simple experiments:
 p.run();

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

Pipeline.PipelineExecutionException

Thrown during execution of a Pipeline, whenever user code within that Pipeline throws an exception.

static interface

Pipeline.PipelineVisitor

For internal use only; no backwards-compatibility guarantees.
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

Pipeline(PipelineOptions options)
Method Summary

Modifier and Type

Method

Description

<OutputT extends POutput> OutputT

apply(String name, PTransform<? super PBegin,OutputT> root)

Adds a root PTransform, such as Read or Create, to this Pipeline.

<OutputT extends POutput> OutputT

apply(PTransform<? super PBegin,OutputT> root)

Like apply(String, PTransform) but the transform node in the Pipeline graph will be named according to PTransform.getName().

static <InputT extends PInput, OutputT extends POutput> OutputT

applyTransform(InputT input, PTransform<? super InputT,OutputT> transform)

For internal use only; no backwards-compatibility guarantees.

static <InputT extends PInput, OutputT extends POutput> OutputT

applyTransform(String name, InputT input, PTransform<? super InputT,OutputT> transform)

For internal use only; no backwards-compatibility guarantees.

PBegin

begin()

Returns a PBegin owned by this Pipeline.

static Pipeline

create()

Constructs a pipeline from default PipelineOptions.

static Pipeline

create(PipelineOptions options)

Constructs a pipeline from the provided PipelineOptions.

static Pipeline

forTransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy transforms, PipelineOptions options)

CoderRegistry

getCoderRegistry()

Returns the CoderRegistry that this Pipeline uses.

PipelineOptions

getOptions()

SchemaRegistry

getSchemaRegistry()

<OutputT extends POutput> ErrorHandler.BadRecordErrorHandler<OutputT>

registerBadRecordErrorHandler(PTransform<PCollection<BadRecord>,OutputT> sinkTransform)

void

replaceAll(List<org.apache.beam.sdk.runners.PTransformOverride> overrides)

For internal use only; no backwards-compatibility guarantees.

PipelineResult

run()

Runs this Pipeline according to the PipelineOptions used to create the Pipeline via create(PipelineOptions).

PipelineResult

run(PipelineOptions options)

Runs this Pipeline using the given PipelineOptions, using the runner specified by the options.

void

setCoderRegistry(CoderRegistry coderRegistry)

Deprecated.
this should never be used - every Pipeline has a registry throughout its lifetime.

String

toString()

void

traverseTopologically(Pipeline.PipelineVisitor visitor)

For internal use only; no backwards-compatibility guarantees.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- Pipeline
  
  protected Pipeline(PipelineOptions options)
Method Details
- create
  
  public static Pipeline create()
  
  Constructs a pipeline from default PipelineOptions.
- create
  
  public static Pipeline create(PipelineOptions options)
  
  Constructs a pipeline from the provided PipelineOptions.
- begin
  
  public PBegin begin()
  
  Returns a PBegin owned by this Pipeline. This serves as the input of a root PTransform such as Read or Create.
- apply
  
  public <OutputT extends POutput> OutputT apply(PTransform<? super PBegin,OutputT> root)
  
  Like apply(String, PTransform) but the transform node in the Pipeline graph will be named according to PTransform.getName().
  See Also:
  
  apply(String, PTransform)
- apply
  
  public <OutputT extends POutput> OutputT apply(String name, PTransform<? super PBegin,OutputT> root)
  
  Adds a root PTransform, such as Read or Create, to this Pipeline.
  The node in the Pipeline graph will use the provided name. This name is used in various places, including the monitoring UI, logging, and to stably identify this node in the Pipeline graph upon update.
  Alias for begin().apply(name, root).
- forTransformHierarchy
  
  @Internal public static Pipeline forTransformHierarchy(org.apache.beam.sdk.runners.TransformHierarchy transforms, PipelineOptions options)
- getOptions
  
  @Internal public PipelineOptions getOptions()
- replaceAll
  
  @Internal public void replaceAll(List<org.apache.beam.sdk.runners.PTransformOverride> overrides)
  
  For internal use only; no backwards-compatibility guarantees.
  Replaces all nodes that match a PTransformOverride in this pipeline. Overrides are applied in the order they are present within the list.
- run
  
  public PipelineResult run()
  
  Runs this Pipeline according to the PipelineOptions used to create the Pipeline via create(PipelineOptions).
- run
  
  public PipelineResult run(PipelineOptions options)
  
  Runs this Pipeline using the given PipelineOptions, using the runner specified by the options.
- getCoderRegistry
  
  public CoderRegistry getCoderRegistry()
  
  Returns the CoderRegistry that this Pipeline uses.
- getSchemaRegistry
  
  public SchemaRegistry getSchemaRegistry()
- registerBadRecordErrorHandler
  
  public <OutputT extends POutput> ErrorHandler.BadRecordErrorHandler<OutputT> registerBadRecordErrorHandler(PTransform<PCollection<BadRecord>,OutputT> sinkTransform)
- setCoderRegistry
  
  @Deprecated public void setCoderRegistry(CoderRegistry coderRegistry)
  
  Deprecated.
  this should never be used - every Pipeline has a registry throughout its lifetime.
- traverseTopologically
  
  @Internal public void traverseTopologically(Pipeline.PipelineVisitor visitor)
  
  For internal use only; no backwards-compatibility guarantees.
  Invokes the PipelineVisitor's Pipeline.PipelineVisitor.visitPrimitiveTransform(org.apache.beam.sdk.runners.TransformHierarchy.Node) and Pipeline.PipelineVisitor.visitValue(org.apache.beam.sdk.values.PValue, org.apache.beam.sdk.runners.TransformHierarchy.Node) operations on each of this Pipeline's transform and value nodes, in forward topological order.
  Traversal of the Pipeline causes PTransforms and PValues owned by the Pipeline to be marked as finished, at which point they may no longer be modified.
  Typically invoked by PipelineRunner subclasses.
- applyTransform
  
  @Internal public static <InputT extends PInput, OutputT extends POutput> OutputT applyTransform(InputT input, PTransform<? super InputT,OutputT> transform)
  
  For internal use only; no backwards-compatibility guarantees.
  Like applyTransform(String, PInput, PTransform) but defaulting to the name provided by the PTransform.
- applyTransform
  
  @Internal public static <InputT extends PInput, OutputT extends POutput> OutputT applyTransform(String name, InputT input, PTransform<? super InputT,OutputT> transform)
  
  For internal use only; no backwards-compatibility guarantees.
  Applies the given PTransform to this input InputT and returns its OutputT. This uses name to identify this specific application of the transform. This name is used in various places, including the monitoring UI, logging, and to stably identify this application node in the Pipeline graph during update.
  Each PInput subclass that provides an apply method should delegate to this method to ensure proper registration with the PipelineRunner.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class Pipeline

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Pipeline

Method Details

create

create

begin

apply

apply

forTransformHierarchy

getOptions

replaceAll

run

run

getCoderRegistry

getSchemaRegistry

registerBadRecordErrorHandler

setCoderRegistry

traverseTopologically

applyTransform

applyTransform

toString