apache_beam.runners.interactive.caching.expression_cache module

class apache_beam.runners.interactive.caching.expression_cache.ExpressionCache(pcollection_cache=None, computed_cache=None)[source]

Bases: object

Utility class for caching deferred DataFrames expressions.

This is cache is currently a light-weight wrapper around the TO_PCOLLECTION_CACHE in the beam.dataframes.convert module and the computed_pcollections in the interactive module.

Example:

df : beam.dataframe.DeferredDataFrame = ...
...
cache = ExpressionCache()
cache.replace_with_cached(df._expr)

This will automatically link the instance to the existing caches. After it is created, the cache can then be used to modify an existing deferred dataframe expression tree to replace nodes with computed PCollections.

This object can be created and destroyed whenever. This class holds no state and the only side-effect is modifying the given expression.

replace_with_cached(expr: apache_beam.dataframe.expressions.Expression) → Dict[str, apache_beam.dataframe.expressions.Expression][source]

Replaces any previously computed expressions with PlaceholderExpressions.

This is used to correctly read any expressions that were cached in previous runs. This enables the InteractiveRunner to prune off old calculations from the expression tree.