apache_beam.runners.interactive.background_caching_job module

Module to build and run background source recording jobs.

For internal use only; no backwards-compatibility guarantees.

A background source recording job is a job that records events for all recordable sources of a given pipeline. With Interactive Beam, one such job is started when a pipeline run happens (which produces a main job in contrast to the background source recording job) and meets the following conditions:

  1. The pipeline contains recordable sources, configured through interactive_beam.options.recordable_sources.
  2. No such background job is running.
  3. No such background job has completed successfully and the cached events are still valid (invalidated when recordable sources change in the pipeline).

Once started, the background source recording job runs asynchronously until it hits some recording limit configured in interactive_beam.options. Meanwhile, the main job and future main jobs from the pipeline will run using the deterministic replayable recorded events until they are invalidated.

class apache_beam.runners.interactive.background_caching_job.BackgroundCachingJob(pipeline_result, limiters)[source]

Bases: object

A simple abstraction that controls necessary components of a timed and space limited background source recording job.

A background source recording job successfully completes source data recording in 2 conditions:

  1. The job is finite and runs into DONE state;
  2. The job is infinite but hits an interactive_beam.options configured limit and gets cancelled into CANCELLED/CANCELLING state.

In both situations, the background source recording job should be treated as done successfully.

is_done()[source]
is_running()[source]
cancel()[source]

Cancels this background source recording job.

state
apache_beam.runners.interactive.background_caching_job.attempt_to_run_background_caching_job(runner, user_pipeline, options=None)[source]

Attempts to run a background source recording job for a user-defined pipeline.

Returns True if a job was started, False otherwise.

The pipeline result is automatically tracked by Interactive Beam in case future cancellation/cleanup is needed.

apache_beam.runners.interactive.background_caching_job.is_background_caching_job_needed(user_pipeline)[source]

Determines if a background source recording job needs to be started.

It does several state checks and recording state changes throughout the process. It is not idempotent to simplify the usage.

apache_beam.runners.interactive.background_caching_job.is_cache_complete(pipeline_id)[source]

Returns True if the backgrond cache for the given pipeline is done.

apache_beam.runners.interactive.background_caching_job.has_source_to_cache(user_pipeline)[source]

Determines if a user-defined pipeline contains any source that need to be cached. If so, also immediately wrap current cache manager held by current interactive environment into a streaming cache if this has not been done. The wrapping doesn’t invalidate existing cache in any way.

This can help determining if a background source recording job is needed to write cache for sources and if a test stream service is needed to serve the cache.

Throughout the check, if source-to-cache has changed from the last check, it also cleans up the invalidated cache early on.

apache_beam.runners.interactive.background_caching_job.attempt_to_cancel_background_caching_job(user_pipeline)[source]

Attempts to cancel background source recording job for a user-defined pipeline.

If no background source recording job needs to be cancelled, NOOP. Otherwise, cancel such job.

apache_beam.runners.interactive.background_caching_job.attempt_to_stop_test_stream_service(user_pipeline)[source]

Attempts to stop the gRPC server/service serving the test stream.

If there is no such server started, NOOP. Otherwise, stop it.

apache_beam.runners.interactive.background_caching_job.is_a_test_stream_service_running(user_pipeline)[source]

Checks to see if there is a gPRC server/service running that serves the test stream to any job started from the given user_pipeline.

apache_beam.runners.interactive.background_caching_job.is_source_to_cache_changed(user_pipeline, update_cached_source_signature=True)[source]

Determines if there is any change in the sources that need to be cached used by the user-defined pipeline.

Due to the expensiveness of computations and for the simplicity of usage, this function is not idempotent because Interactive Beam automatically discards previously tracked signature of transforms and tracks the current signature of transforms for the user-defined pipeline if there is any change.

When it’s True, there is addition/deletion/mutation of source transforms that requires a new background source recording job.

apache_beam.runners.interactive.background_caching_job.extract_source_to_cache_signature(user_pipeline)[source]

Extracts a set of signature for sources that need to be cached in the user-defined pipeline.

A signature is a str representation of urn and payload of a source.