apache_beam.runners.interactive.recording_manager module

class apache_beam.runners.interactive.recording_manager.AsyncComputationResult(future: Future, pcolls: set[PCollection], user_pipeline: Pipeline, recording_manager: RecordingManager)[source]

Bases: object

Represents the result of an asynchronous computation.

update_display(msg: str, progress: float | None = None)[source]
set_pipeline_result(pipeline_result: PipelineResult)[source]
result(timeout=None)[source]
done()[source]
exception(timeout=None)[source]
cancel()[source]
class apache_beam.runners.interactive.recording_manager.ElementStream(pcoll: PCollection, var: str, cache_key: str, max_n: int, max_duration_secs: float)[source]

Bases: object

A stream of elements from a given PCollection.

property var: str

Returns the variable named that defined this PCollection.

property pcoll: PCollection

Returns the PCollection that supplies this stream with data.

property cache_key: str

Returns the cache key for this stream.

display_id(suffix: str) str[source]

Returns a unique id able to be displayed in a web browser.

is_computed() bool[source]

Returns True if no more elements will be recorded.

is_done() bool[source]

Returns True if no more new elements will be yielded.

read(tail: bool = True) Any[source]

Reads the elements currently recorded.

class apache_beam.runners.interactive.recording_manager.Recording(user_pipeline: Pipeline, pcolls: list[PCollection], result: beam.runner.PipelineResult, max_n: int, max_duration_secs: float)[source]

Bases: object

A group of PCollections from a given pipeline run.

is_computed() bool[source]

Returns True if all PCollections are computed.

stream(pcoll: PCollection) ElementStream[source]

Returns an ElementStream for a given PCollection.

computed() None[source]

Returns all computed ElementStreams.

uncomputed() None[source]

Returns all uncomputed ElementStreams.

cancel() None[source]

Cancels the recording.

wait_until_finish() None[source]

Waits until the pipeline is done and returns the final state.

This also marks any PCollections as computed right away if the pipeline is successful.

describe() dict[str, int][source]

Returns a dictionary describing the cache and recording.

class apache_beam.runners.interactive.recording_manager.RecordingManager(user_pipeline: Pipeline, pipeline_var: str | None = None, test_limiters: list[Limiter] | None = None)[source]

Bases: object

Manages recordings of PCollections for a given pipeline.

clear() None[source]

Clears all cached PCollections for this RecordingManager.

cancel() None[source]

Cancels the current background recording job.

describe() dict[str, int][source]

Returns a dictionary describing the cache and recording.

record_pipeline() bool[source]

Starts a background caching job for this RecordingManager’s pipeline.

compute_async(pcolls: set[PCollection], wait_for_inputs: bool = True, blocking: bool = False, runner: PipelineRunner | None = None, options: PipelineOptions | None = None, force_compute: bool = False) AsyncComputationResult | None[source]

Computes the given PCollections, potentially asynchronously.

record(pcolls: list[PCollection], *, max_n: int, max_duration: int | str, runner: PipelineRunner | None = None, options: PipelineOptions | None = None, force_compute: bool = False) Recording[source]

Records the given PCollections.

read(pcoll_name: str, pcoll: PValue, max_n: int, max_duration_secs: float) None | ElementStream[source]

Reads an ElementStream of a computed PCollection.

Returns None if an error occurs. The caller is responsible of validating if the given pcoll_name and pcoll can identify a watched and computed PCollection without ambiguity in the notebook.