apache_beam.runners.interactive.augmented_pipeline module

Module to augment interactive flavor into the given pipeline.

For internal use only; no backward-compatibility guarantees.

class apache_beam.runners.interactive.augmented_pipeline.AugmentedPipeline(user_pipeline: apache_beam.pipeline.Pipeline, pcolls: Optional[Set[apache_beam.pvalue.PCollection]] = None)[source]

Bases: object

A pipeline with augmented interactive flavor that caches intermediate PCollections defined by the user, reads computed PCollections as source and prunes unnecessary pipeline parts for fast computation.

Initializes a pipelilne for augmenting interactive flavor.

  • user_pipeline – a beam.Pipeline instance defined by the user.
  • pcolls – cacheable pcolls to be computed/retrieved. If the set is empty, all intermediate pcolls assigned to variables are applicable.
cacheables() → Dict[apache_beam.pvalue.PCollection, apache_beam.runners.interactive.caching.cacheable.Cacheable][source]

Finds all the cacheable intermediate PCollections in the pipeline with their metadata.

augment() → org.apache.beam.model.pipeline.v1.beam_runner_api_pb2.Pipeline[source]

Augments the pipeline with cache. Always calculates a new result.

For a cacheable PCollection, if cache exists, read cache; else, write cache.