apache_beam.runners.interactive.interactive_environment module

Module of the current Interactive Beam environment.

For internal use only; no backwards-compatibility guarantees. Provides interfaces to interact with existing Interactive Beam environment. External Interactive Beam users please use interactive_beam module in application code or notebook.

apache_beam.runners.interactive.interactive_environment.current_env(cache_manager=None)[source]

Gets current Interactive Beam environment.

apache_beam.runners.interactive.interactive_environment.new_env(cache_manager=None)[source]

Creates a new Interactive Beam environment to replace current one.

class apache_beam.runners.interactive.interactive_environment.InteractiveEnvironment(cache_manager=None)[source]

Bases: object

An interactive environment with cache and pipeline variable metadata.

Interactive Beam will use the watched variable information to determine if a PCollection is assigned to a variable in user pipeline definition. When executing the pipeline, interactivity is applied with implicit cache mechanism for those PCollections if the pipeline is interactive. Users can also visualize and introspect those PCollections in user code since they have handles to the variables.

options

A reference to the global interactive options.

Provided to avoid import loop or excessive dynamic import. All internal Interactive Beam modules should access interactive_beam.options through this property.

is_py_version_ready

If Python version is above the minimum requirement.

is_interactive_ready

If the [interactive] dependencies are installed.

is_in_ipython

If the runtime is within an IPython kernel.

is_in_notebook

If the kernel is connected to a notebook frontend.

If not, it could be that the user is using kernel in a terminal or a unit test.

cleanup()[source]
watch(watchable)[source]

Watches a watchable.

A watchable can be a dictionary of variable metadata such as locals(), a str name of a module, a module object or an instance of a class. The variable can come from any scope even local. Duplicated variable naming doesn’t matter since they are different instances. Duplicated variables are also allowed when watching.

watching()[source]

Analyzes and returns a list of pair lists referring to variable names and values from watched scopes.

Each entry in the list represents the variable defined within a watched watchable. Currently, each entry holds a list of pairs. The format might change in the future to hold more metadata. Duplicated pairs are allowed. And multiple paris can have the same variable name as the “first” while having different variable values as the “second” since variables in different scopes can have the same name.

set_cache_manager(cache_manager)[source]

Sets the cache manager held by current Interactive Environment.

cache_manager()[source]

Gets the cache manager held by current Interactive Environment.

set_pipeline_result(pipeline, result)[source]

Sets the pipeline run result. Adds one if absent. Otherwise, replace.

evict_pipeline_result(pipeline)[source]

Evicts the tracking of given pipeline run. Noop if absent.

pipeline_result(pipeline)[source]

Gets the pipeline run result. None if absent.

set_background_caching_job(pipeline, background_caching_job)[source]

Sets the background caching job started from the given pipeline.

get_background_caching_job(pipeline)[source]

Gets the background caching job started from the given pipeline.

set_test_stream_service_controller(pipeline, controller)[source]

Sets the test stream service controller that has started a gRPC server serving the test stream for any job started from the given user-defined pipeline.

get_test_stream_service_controller(pipeline)[source]

Gets the test stream service controller that has started a gRPC server serving the test stream for any job started from the given user-defined pipeline.

evict_test_stream_service_controller(pipeline)[source]

Evicts and pops the test stream service controller that has started a gRPC server serving the test stream for any job started from the given user-defined pipeline.

is_terminated(pipeline)[source]

Queries if the most recent job (by executing the given pipeline) state is in a terminal state. True if absent.

set_cached_source_signature(pipeline, signature)[source]
get_cached_source_signature(pipeline)[source]
evict_cached_source_signature(pipeline=None)[source]
track_user_pipelines()[source]

Record references to all user-defined pipeline instances watched in current environment.

Current static global singleton interactive environment holds references to a set of pipeline instances defined by the user in the watched scope. Interactive Beam features could use the references to determine if a given pipeline is defined by user or implicitly created by Beam SDK or runners, then handle them differently.

This is invoked every time a PTransform is to be applied if the current code execution is under ipython due to the possibility that any user-defined pipeline can be re-evaluated through notebook cell re-execution at any time.

Each time this is invoked, the tracked user pipelines are refreshed to remove any pipeline instances that are no longer in watched scope. For example, after a notebook cell re-execution re-evaluating a pipeline creation, the last pipeline reference created by last evaluation will not be in watched scope anymore.

tracked_user_pipelines
pipeline_id_to_pipeline(pid)[source]

Converts a pipeline id to a user pipeline.

mark_pcollection_computed(pcolls)[source]

Marks computation completeness for the given pcolls.

Interactive Beam can use this information to determine if a computation is needed to introspect the data of any given PCollection.

evict_computed_pcollections()[source]

Evicts all computed PCollections.

Interactive Beam will treat none of the PCollections in any given pipeline as completely computed.

computed_pcollections
load_jquery_with_datatable()[source]

Loads common resources to enable jquery with datatable configured for notebook frontends if necessary. If the resources have been loaded, NOOP.

A window.interactive_beam_jquery with datatable plugin configured can be used in following notebook cells once this is invoked.

  1. There should only be one jQuery imported.
  2. Datatable needs to be imported after jQuery is loaded.
  3. Imported jQuery is attached to window named as jquery[version].
  4. The window attachment needs to happen at the end of import chain until all jQuery plugins are set.
import_html_to_head(html_hrefs)[source]

Imports given external HTMLs (supported through webcomponents) into the head of the document.

On load of webcomponentsjs, import given HTMLs. If HTML import is already supported, skip loading webcomponentsjs.

No matter how many times an HTML import occurs in the document, only the first occurrence really embeds the external HTML. In a notebook environment, the body of the document is always changing due to cell [re-]execution, deletion and re-ordering. Thus, HTML imports shouldn’t be put in the body especially the output areas of notebook cells.