apache_beam.runners.interactive.sql.sql_chain module

Module for tracking a chain of beam_sql magics applied.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.runners.interactive.sql.sql_chain.SqlNode(output_name: str, source: Union[apache_beam.pipeline.Pipeline, Set[str]], query: str, schemas: Set[Any] = None, evaluated: Set[apache_beam.pipeline.Pipeline] = None, next: Optional[SqlNode] = None, execution_count: int = 0)[source]

Bases: object

Each SqlNode represents a beam_sql magic applied.


the watched unique name of the beam_sql output. Can be used as an identifier.


the inputs consumed by this node. Can be a pipeline or a set of PCollections represented by their variable names watched. When it’s a pipeline, the node computes from raw values in the query, so the output can be consumed by any SqlNode in any SqlChain.


the SQL query applied by this node.


the schemas (NamedTuple classes) used by this node.


the pipelines this node has been evaluated for.


the next SqlNode applied chronologically.


the execution count if in an IPython env.

schemas = None
evaluated = None
next = None
execution_count = 0
to_pipeline(pipeline: Optional[apache_beam.pipeline.Pipeline]) → apache_beam.pipeline.Pipeline[source]

Converts the chain into an executable pipeline.

class apache_beam.runners.interactive.sql.sql_chain.SchemaLoadedSqlTransform(output_name, query, schemas, execution_count)[source]

Bases: apache_beam.transforms.ptransform.PTransform

PTransform that loads schema before executing SQL.

When submitting a pipeline to remote runner for execution, schemas defined in the main module are not available without save_main_session. However, save_main_session might fail when there is anything unpicklable. This DoFn makes sure only the schemas needed are pickled locally and restored later on workers.


Applies the SQL transform. If a PCollection uses a schema defined in the main session, use the additional DoFn to restore it on the worker.

class apache_beam.runners.interactive.sql.sql_chain.SqlChain(nodes: Dict[str, apache_beam.runners.interactive.sql.sql_chain.SqlNode] = None, root: Optional[apache_beam.runners.interactive.sql.sql_chain.SqlNode] = None, current: Optional[apache_beam.runners.interactive.sql.sql_chain.SqlNode] = None, user_pipeline: Optional[apache_beam.pipeline.Pipeline] = None)[source]

Bases: object

A chain of SqlNodes.


all nodes by their output_names.


the first SqlNode applied chronologically.


the last node applied.


the user defined pipeline this chain originates from. If None, the whole chain just computes from raw values in queries. Otherwise, at least some of the nodes in chain has queried against PCollections.

nodes = None
root = None
current = None
user_pipeline = None
to_pipeline() → apache_beam.pipeline.Pipeline[source]

Converts the chain into a beam pipeline.

append(node: apache_beam.runners.interactive.sql.sql_chain.SqlNode) → apache_beam.runners.interactive.sql.sql_chain.SqlChain[source]

Appends a node to the chain.

get(output_name: str) → Optional[apache_beam.runners.interactive.sql.sql_chain.SqlNode][source]

Gets a node from the chain based on the given output_name.