apache_beam.runners.interactive.cache_manager module¶
-
class
apache_beam.runners.interactive.cache_manager.
CacheManager
[source]¶ Bases:
object
Abstract class for caching PCollections.
A PCollection cache is identified by labels, which consist of a prefix (either ‘full’ or ‘sample’) and a cache_label which is a hash of the PCollection derivation.
-
read
(*labels, **args)[source]¶ Return the PCollection as a list as well as the version number.
Parameters: - *labels – List of labels for PCollection instance.
- **args – Dict of additional arguments. Currently only supports ‘limiters’ as a list of ElementLimiters, and ‘tail’ as a boolean. Limiters limits the amount of elements read and duration with respect to processing time.
Returns: - A tuple containing an iterator for the items in the PCollection and the
version number.
It is possible that the version numbers from read() and_latest_version() are different. This usually means that the cache’s been evicted (thus unavailable => read() returns version = -1), but it had reached version n before eviction.
-
write
(value, *labels)[source]¶ Writes the value to the given cache.
Parameters: - value – An encodable (with corresponding PCoder) value
- *labels – List of labels for PCollection instance
-
clear
(*labels)[source]¶ Clears the cache entry of the given labels and returns True on success.
Parameters: - value – An encodable (with corresponding PCoder) value
- *labels – List of labels for PCollection instance
-
sink
(labels, is_capture=False)[source]¶ Returns a PTransform that writes the PCollection cache.
TODO(BEAM-10514): Make sure labels will not be converted into an arbitrarily long file path: e.g., windows has a 260 path limit.
-
save_pcoder
(pcoder, *labels)[source]¶ Saves pcoder for given PCollection.
Correct reading of PCollection from Cache requires PCoder to be known. This method saves desired PCoder for PCollection that will subsequently be used by sink(…), source(…), and, most importantly, read(…) method. The latter must be able to read a PCollection written by Beam using non-Beam IO.
Parameters: - pcoder – A PCoder to be used for reading and writing a PCollection.
- *labels – List of labels for PCollection instance.
-
-
class
apache_beam.runners.interactive.cache_manager.
FileBasedCacheManager
(cache_dir=None, cache_format='text')[source]¶ Bases:
apache_beam.runners.interactive.cache_manager.CacheManager
Maps PCollections to local temp files for materialization.
-
class
apache_beam.runners.interactive.cache_manager.
ReadCache
(cache_manager, label)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
A PTransform that reads the PCollections from the cache.
-
class
apache_beam.runners.interactive.cache_manager.
WriteCache
(cache_manager, label, sample=False, sample_size=0, is_capture=False)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
A PTransform that writes the PCollections to the cache.