apache_beam.runners.interactive.recording_manager module

class apache_beam.runners.interactive.recording_manager.ElementStream(pcoll: apache_beam.pvalue.PCollection, var: str, cache_key: str, max_n: int, max_duration_secs: float)[source]

Bases: object

A stream of elements from a given PCollection.

var

Returns the variable named that defined this PCollection.

pcoll

Returns the PCollection that supplies this stream with data.

cache_key

Returns the cache key for this stream.

display_id(suffix: str) → str[source]

Returns a unique id able to be displayed in a web browser.

is_computed() → bool[source]

Returns True if no more elements will be recorded.

is_done() → bool[source]

Returns True if no more new elements will be yielded.

read(tail: bool = True) → Any[source]

Reads the elements currently recorded.

class apache_beam.runners.interactive.recording_manager.Recording(user_pipeline: apache_beam.pipeline.Pipeline, pcolls: List[apache_beam.pvalue.PCollection], result: beam.runner.PipelineResult, max_n: int, max_duration_secs: float)[source]

Bases: object

A group of PCollections from a given pipeline run.

is_computed() → bool[source]

Returns True if all PCollections are computed.

stream(pcoll: apache_beam.pvalue.PCollection) → apache_beam.runners.interactive.recording_manager.ElementStream[source]

Returns an ElementStream for a given PCollection.

computed() → None[source]

Returns all computed ElementStreams.

uncomputed() → None[source]

Returns all uncomputed ElementStreams.

cancel() → None[source]

Cancels the recording.

wait_until_finish() → None[source]

Waits until the pipeline is done and returns the final state.

This also marks any PCollections as computed right away if the pipeline is successful.

describe() → Dict[str, int][source]

Returns a dictionary describing the cache and recording.

class apache_beam.runners.interactive.recording_manager.RecordingManager(user_pipeline: apache_beam.pipeline.Pipeline, pipeline_var: str = None, test_limiters: List[Limiter] = None)[source]

Bases: object

Manages recordings of PCollections for a given pipeline.

clear() → None[source]

Clears all cached PCollections for this RecordingManager.

cancel() → None[source]

Cancels the current background recording job.

describe() → Dict[str, int][source]

Returns a dictionary describing the cache and recording.

record_pipeline() → bool[source]

Starts a background caching job for this RecordingManager’s pipeline.

record(pcolls: List[apache_beam.pvalue.PCollection], max_n: int, max_duration: Union[int, str]) → apache_beam.runners.interactive.recording_manager.Recording[source]

Records the given PCollections.

read(pcoll_name: str, pcoll: apache_beam.pvalue.PValue, max_n: int, max_duration_secs: float) → Union[None, apache_beam.runners.interactive.recording_manager.ElementStream][source]

Reads an ElementStream of a computed PCollection.

Returns None if an error occurs. The caller is responsible of validating if the given pcoll_name and pcoll can identify a watched and computed PCollection without ambiguity in the notebook.