Class Pipeline

java.lang.Object
org.apache.beam.sdk.Pipeline
Direct Known Subclasses:
TestPipeline

public class Pipeline extends Object
A Pipeline manages a directed acyclic graph of PTransforms, and the PCollections that the PTransforms consume and produce.

Each Pipeline is self-contained and isolated from any other Pipeline. The PValues that are inputs and outputs of each of a Pipeline's PTransforms are also owned by that Pipeline. A PValue owned by one Pipeline can be read only by PTransforms also owned by that Pipeline. Pipelines can safely be executed concurrently.

Here is a typical example of use:


 // Start by defining the options for the pipeline.
 PipelineOptions options = PipelineOptionsFactory.create();
 // Then create the pipeline. The runner is determined by the options.
 Pipeline p = Pipeline.create(options);

 // A root PTransform, like TextIO.Read or Create, gets added
 // to the Pipeline by being applied:
 PCollection<String> lines =
     p.apply(TextIO.read().from("gs://bucket/dir/file*.txt"));

 // A Pipeline can have multiple root transforms:
 PCollection<String> moreLines =
     p.apply(TextIO.read().from("gs://bucket/other/dir/file*.txt"));
 PCollection<String> yetMoreLines =
     p.apply(Create.of("yet", "more", "lines").withCoder(StringUtf8Coder.of()));

 // Further PTransforms can be applied, in an arbitrary (acyclic) graph.
 // Subsequent PTransforms (and intermediate PCollections etc.) are
 // implicitly part of the same Pipeline.
 PCollection<String> allLines =
     PCollectionList.of(lines).and(moreLines).and(yetMoreLines)
     .apply(new Flatten<String>());
 PCollection<KV<String, Integer>> wordCounts =
     allLines
     .apply(ParDo.of(new ExtractWords()))
     .apply(new Count<String>());
 PCollection<String> formattedWordCounts =
     wordCounts.apply(ParDo.of(new FormatCounts()));
 formattedWordCounts.apply(TextIO.write().to("gs://bucket/dir/counts.txt"));

 // PTransforms aren't executed when they're applied, rather they're
 // just added to the Pipeline.  Once the whole Pipeline of PTransforms
 // is constructed, the Pipeline's PTransforms can be run using a
 // PipelineRunner.  The default PipelineRunner executes the Pipeline
 // directly, sequentially, in this one process, which is useful for
 // unit tests and simple experiments:
 p.run();