apache_beam.runners.interactive.caching.write_cache module

Module to write cache for PCollections being computed.

For internal use only; no backward-compatibility guarantees.

class apache_beam.runners.interactive.caching.write_cache.WriteCache(pipeline: org.apache.beam.model.pipeline.v1.beam_runner_api_pb2.Pipeline, context: apache_beam.runners.pipeline_context.PipelineContext, cache_manager: apache_beam.runners.interactive.cache_manager.CacheManager, cacheable: apache_beam.runners.interactive.caching.cacheable.Cacheable)[source]

Bases: object

Class that facilitates writing cache for PCollections being computed.

write_cache() → None[source]

Writes cache for the cacheable PCollection that is being computed.

First, it creates a temporary pipeline instance on top of the existing component_id_map from self._pipeline’s context so that both pipelines share the context and have no conflict component ids. Second, it creates a _PCollectionPlaceHolder in the temporary pipeline that mimics the attributes of the cacheable PCollection to be written into cache. It also marks all components in the current temporary pipeline as ignorable when later copying components to self._pipeline. Third, it instantiates a _WriteCacheTransform that uses the _PCollectionPlaceHolder as the input. This adds a subgraph under top level transforms that writes the _PCollectionPlaceHolder into cache. Fourth, it copies components of the subgraph from the temporary pipeline to self._pipeline, skipping components that are ignored in the temporary pipeline and components that are not in the temporary pipeline but presents in the component_id_map of self._pipeline. Last, it replaces inputs of all transforms that consume the _PCollectionPlaceHolder with the cacheable PCollection to be written to cache.