apache_beam.runners.interactive.display.pcoll_visualization module

Module visualizes PCollection data.

For internal use only; no backwards-compatibility guarantees. Only works with Python 3.5+.

apache_beam.runners.interactive.display.pcoll_visualization.visualize(stream, dynamic_plotting_interval=None, include_window_info=False, display_facets=False)[source]

Visualizes the data of a given PCollection. Optionally enables dynamic plotting with interval in seconds if the PCollection is being produced by a running pipeline or the pipeline is streaming indefinitely. The function always returns immediately and is asynchronous when dynamic plotting is on.

If dynamic plotting enabled, the visualization is updated continuously until the pipeline producing the PCollection is in an end state. The visualization would be anchored to the notebook cell output area. The function asynchronously returns a handle to the visualization job immediately. The user could manually do:

# In one notebook cell, enable dynamic plotting every 1 second:
handle = visualize(pcoll, dynamic_plotting_interval=1)
# Visualization anchored to the cell's output area.
# In a different cell:
handle.stop()
# Will stop the dynamic plotting of the above visualization manually.
# Otherwise, dynamic plotting ends when pipeline is not running anymore.

If dynamic_plotting is not enabled (by default), None is returned.

If include_window_info is True, the data will include window information, which consists of the event timestamps, windows, and pane info.

If display_facets is True, the facets widgets will be rendered. Otherwise, the facets widgets will not be rendered.

The function is experimental. For internal use only; no backwards-compatibility guarantees.

class apache_beam.runners.interactive.display.pcoll_visualization.PCollectionVisualization(stream, include_window_info=False, display_facets=False)[source]

Bases: object

A visualization of a PCollection.

The class relies on creating a PipelineInstrument w/o actual instrument to access current interactive environment for materialized PCollection data at the moment of self instantiation through cache.

display_plain_text()[source]

Displays a head sample of the normalized PCollection data.

This function is used when the ipython kernel is not connected to a notebook frontend such as when running ipython in terminal or in unit tests. It’s a visualization in terminal-like UI, not a function to retrieve data for programmatically usages.

display(updating_pv=None)[source]

Displays the visualization through IPython.

Parameters:updating_pv – A PCollectionVisualization object. When provided, the display_id of each visualization part will inherit from the initial display of updating_pv and only update that visualization web element instead of creating new ones.

The visualization has 3 parts: facets-dive, facets-overview and paginated data table. Each part is assigned an auto-generated unique display id (the uniqueness is guaranteed throughout the lifespan of the PCollection variable).

apache_beam.runners.interactive.display.pcoll_visualization.format_window_info_in_dataframe(data)[source]
apache_beam.runners.interactive.display.pcoll_visualization.event_time_formatter(event_time_us)[source]
apache_beam.runners.interactive.display.pcoll_visualization.windows_formatter(windows)[source]
apache_beam.runners.interactive.display.pcoll_visualization.pane_info_formatter(pane_info)[source]