apache_beam.runners.interactive.background_caching_job module

Module to build and run background caching job.

For internal use only; no backwards-compatibility guarantees.

A background caching job is a job that caches events for all unbounded sources of a given pipeline. With Interactive Beam, one such job is started when a pipeline run happens (which produces a main job in contrast to the background caching job) and meets the following conditions:

  1. The pipeline contains unbounded sources.
  2. No such background job is running.
  3. No such background job has completed successfully and the cached events are still valid (invalidated when unbounded sources change in the pipeline).

Once started, the background caching job runs asynchronously until it hits some cache size limit. Meanwhile, the main job and future main jobs from the pipeline will run using the deterministic replay-able cached events until they are invalidated.

apache_beam.runners.interactive.background_caching_job.attempt_to_run_background_caching_job(runner, user_pipeline, options=None)[source]

Attempts to run a background caching job for a user-defined pipeline.

The pipeline result is automatically tracked by Interactive Beam in case future cancellation/cleanup is needed.

apache_beam.runners.interactive.background_caching_job.is_background_caching_job_needed(user_pipeline)[source]

Determines if a background caching job needs to be started.

apache_beam.runners.interactive.background_caching_job.has_source_to_cache(user_pipeline)[source]

Determines if a user-defined pipeline contains any source that need to be cached.

apache_beam.runners.interactive.background_caching_job.attempt_to_cancel_background_caching_job(user_pipeline)[source]

Attempts to cancel background caching job for a user-defined pipeline.

If no background caching job needs to be cancelled, NOOP. Otherwise, cancel such job.

apache_beam.runners.interactive.background_caching_job.is_source_to_cache_changed(user_pipeline)[source]

Determines if there is any change in the sources that need to be cached used by the user-defined pipeline.

Due to the expensiveness of computations and for the simplicity of usage, this function is not idempotent because Interactive Beam automatically discards previously tracked signature of transforms and tracks the current signature of transforms for the user-defined pipeline if there is any change.

When it’s True, there is addition/deletion/mutation of source transforms that requires a new background caching job.

apache_beam.runners.interactive.background_caching_job.extract_source_to_cache_signature(user_pipeline)[source]

Extracts a set of signature for sources that need to be cached in the user-defined pipeline.

A signature is a str representation of urn and payload of a source.