apache_beam.runners.interactive.interactive_pipeline_graph module

Helper to render pipeline graph in IPython when running interactively.

This module is experimental. No backwards-compatibility guarantees.

apache_beam.runners.interactive.interactive_pipeline_graph.nice_str(o)[source]
apache_beam.runners.interactive.interactive_pipeline_graph.format_sample(contents, count=1000)[source]
class apache_beam.runners.interactive.interactive_pipeline_graph.InteractivePipelineGraph(pipeline, required_transforms=None, referenced_pcollections=None, cached_pcollections=None)[source]

Bases: apache_beam.runners.interactive.pipeline_graph.PipelineGraph

Creates the DOT representation of an interactive pipeline. Thread-safe.

Constructor of PipelineGraph.

Parameters:
  • pipeline – (Pipeline proto) or (Pipeline) pipeline to be rendered.
  • required_transforms – (dict from str to PTransform proto) Mapping from transform ID to transforms that leads to visible results.
  • referenced_pcollections – (dict from str to PCollection proto) PCollection ID mapped to PCollection referenced during pipeline execution.
  • cached_pcollections – (set of str) a set of PCollection IDs of those whose cached results are used in the execution.
update_pcollection_stats(pcollection_stats)[source]

Updates PCollection stats.

Parameters:pcollection_stats – (dict of dict) maps PCollection IDs to informations. In particular, we only care about the field ‘sample’ which should be a the PCollection result in as a list.