Module visualizes PCollection data.
For internal use only; no backwards-compatibility guarantees. Only works with Python 3.5+.
visualize(stream, dynamic_plotting_interval=None, include_window_info=False, display_facets=False, element_type=None)¶
Visualizes the data of a given PCollection. Optionally enables dynamic plotting with interval in seconds if the PCollection is being produced by a running pipeline or the pipeline is streaming indefinitely. The function always returns immediately and is asynchronous when dynamic plotting is on.
If dynamic plotting enabled, the visualization is updated continuously until the pipeline producing the PCollection is in an end state. The visualization would be anchored to the notebook cell output area. The function asynchronously returns a handle to the visualization job immediately. The user could manually do:
# In one notebook cell, enable dynamic plotting every 1 second: handle = visualize(pcoll, dynamic_plotting_interval=1) # Visualization anchored to the cell's output area. # In a different cell: handle.stop() # Will stop the dynamic plotting of the above visualization manually. # Otherwise, dynamic plotting ends when pipeline is not running anymore.
If dynamic_plotting is not enabled (by default), None is returned.
If include_window_info is True, the data will include window information, which consists of the event timestamps, windows, and pane info.
If display_facets is True, the facets widgets will be rendered. Otherwise, the facets widgets will not be rendered.
The function is experimental. For internal use only; no backwards-compatibility guarantees.
visualize_computed_pcoll(pcoll_name: str, pcoll: apache_beam.pvalue.PCollection, max_n: int, max_duration_secs: float, dynamic_plotting_interval: Optional[int] = None, include_window_info: bool = False, display_facets: bool = False) → None¶
A simple visualize alternative.
When the pcoll_name and pcoll pair identifies a watched and computed PCollection in the current interactive environment without ambiguity, an ElementStream can be built directly from cache. Returns immediately, the visualization is asynchronous, but guaranteed to end in the near future.
- pcoll_name – the variable name of the PCollection.
- pcoll – the PCollection to be visualized.
- max_n – the maximum number of elements to visualize.
- max_duration_secs – max duration of elements to read in seconds.
- dynamic_plotting_interval – the interval in seconds between visualization updates if provided; otherwise, no dynamic plotting.
- include_window_info – whether to include windowing info in the elements.
- display_facets – whether to display the facets widgets.
PCollectionVisualization(stream, include_window_info=False, display_facets=False, element_type=None)¶
A visualization of a PCollection.
The class relies on creating a PipelineInstrument w/o actual instrument to access current interactive environment for materialized PCollection data at the moment of self instantiation through cache.
Displays a head sample of the normalized PCollection data.
This function is used when the ipython kernel is not connected to a notebook frontend such as when running ipython in terminal or in unit tests. It’s a visualization in terminal-like UI, not a function to retrieve data for programmatically usages.
Displays the visualization through IPython.
Parameters: updating_pv – A PCollectionVisualization object. When provided, the display_id of each visualization part will inherit from the initial display of updating_pv and only update that visualization web element instead of creating new ones.
The visualization has 3 parts: facets-dive, facets-overview and paginated data table. Each part is assigned an auto-generated unique display id (the uniqueness is guaranteed throughout the lifespan of the PCollection variable).