apache_beam.runners.interactive.recording_manager module

class apache_beam.runners.interactive.recording_manager.ElementStream(pcoll, var, cache_key, max_n, max_duration_secs)[source]

Bases: object

A stream of elements from a given PCollection.

var

Returns the variable named that defined this PCollection.

pcoll

Returns the PCollection that supplies this stream with data.

cache_key

Returns the cache key for this stream.

display_id(suffix)[source]

Returns a unique id able to be displayed in a web browser.

is_computed()[source]

Returns True if no more elements will be recorded.

is_done()[source]

Returns True if no more new elements will be yielded.

read(tail=True)[source]

Reads the elements currently recorded.

class apache_beam.runners.interactive.recording_manager.Recording(user_pipeline, pcolls, result, max_n, max_duration_secs)[source]

Bases: object

A group of PCollections from a given pipeline run.

is_computed()[source]

Returns True if all PCollections are computed.

stream(pcoll)[source]

Returns an ElementStream for a given PCollection.

computed()[source]

Returns all computed ElementStreams.

uncomputed()[source]

Returns all uncomputed ElementStreams.

cancel()[source]

Cancels the recording.

wait_until_finish()[source]

Waits until the pipeline is done and returns the final state.

This also marks any PCollections as computed right away if the pipeline is successful.

describe()[source]

Returns a dictionary describing the cache and recording.

class apache_beam.runners.interactive.recording_manager.RecordingManager(user_pipeline, pipeline_var=None, test_limiters=None)[source]

Bases: object

Manages recordings of PCollections for a given pipeline.

clear()[source]

Clears all cached PCollections for this RecordingManager.

cancel()[source]

Cancels the current background recording job.

describe()[source]

Returns a dictionary describing the cache and recording.

record_pipeline()[source]

Starts a background caching job for this RecordingManager’s pipeline.

record(pcolls, max_n, max_duration)[source]

Records the given PCollections.

read(pcoll_name, pcoll, max_n, max_duration_secs)[source]

Reads an ElementStream of a computed PCollection.

Returns None if an error occurs. The caller is responsible of validating if the given pcoll_name and pcoll can identify a watched and computed PCollection without ambiguity in the notebook.