apache_beam.runners.interactive.recording_manager module

class apache_beam.runners.interactive.recording_manager.ElementStream(pcoll: PCollection, var: str, cache_key: str, max_n: int, max_duration_secs: float)[source]

Bases: object

A stream of elements from a given PCollection.

property var: str

Returns the variable named that defined this PCollection.

property pcoll: PCollection

Returns the PCollection that supplies this stream with data.

property cache_key: str

Returns the cache key for this stream.

display_id(suffix: str) str[source]

Returns a unique id able to be displayed in a web browser.

is_computed() bool[source]

Returns True if no more elements will be recorded.

is_done() bool[source]

Returns True if no more new elements will be yielded.

read(tail: bool = True) Any[source]

Reads the elements currently recorded.

class apache_beam.runners.interactive.recording_manager.Recording(user_pipeline: Pipeline, pcolls: List[PCollection], result: beam.runner.PipelineResult, max_n: int, max_duration_secs: float)[source]

Bases: object

A group of PCollections from a given pipeline run.

is_computed() bool[source]

Returns True if all PCollections are computed.

stream(pcoll: PCollection) ElementStream[source]

Returns an ElementStream for a given PCollection.

computed() None[source]

Returns all computed ElementStreams.

uncomputed() None[source]

Returns all uncomputed ElementStreams.

cancel() None[source]

Cancels the recording.

wait_until_finish() None[source]

Waits until the pipeline is done and returns the final state.

This also marks any PCollections as computed right away if the pipeline is successful.

describe() Dict[str, int][source]

Returns a dictionary describing the cache and recording.

class apache_beam.runners.interactive.recording_manager.RecordingManager(user_pipeline: Pipeline, pipeline_var: str = None, test_limiters: List[Limiter] = None)[source]

Bases: object

Manages recordings of PCollections for a given pipeline.

clear() None[source]

Clears all cached PCollections for this RecordingManager.

cancel() None[source]

Cancels the current background recording job.

describe() Dict[str, int][source]

Returns a dictionary describing the cache and recording.

record_pipeline() bool[source]

Starts a background caching job for this RecordingManager’s pipeline.

record(pcolls: List[PCollection], *, max_n: int, max_duration: int | str, runner: PipelineRunner | None = None, options: PipelineOptions | None = None, force_compute: bool = False) Recording[source]

Records the given PCollections.

read(pcoll_name: str, pcoll: PValue, max_n: int, max_duration_secs: float) None | ElementStream[source]

Reads an ElementStream of a computed PCollection.

Returns None if an error occurs. The caller is responsible of validating if the given pcoll_name and pcoll can identify a watched and computed PCollection without ambiguity in the notebook.