apache_beam.runners.interactive.sql.sql_chain module¶
Module for tracking a chain of beam_sql magics applied.
For internal use only; no backwards-compatibility guarantees.
-
class
apache_beam.runners.interactive.sql.sql_chain.
SqlNode
(output_name: str, source: Union[apache_beam.pipeline.Pipeline, Set[str]], query: str, schemas: Set[Any] = None, evaluated: Set[apache_beam.pipeline.Pipeline] = None, next: Optional[SqlNode] = None, execution_count: int = 0)[source]¶ Bases:
object
Each SqlNode represents a beam_sql magic applied.
-
output_name
¶ the watched unique name of the beam_sql output. Can be used as an identifier.
-
source
¶ the inputs consumed by this node. Can be a pipeline or a set of PCollections represented by their variable names watched. When it’s a pipeline, the node computes from raw values in the query, so the output can be consumed by any SqlNode in any SqlChain.
-
query
¶ the SQL query applied by this node.
-
schemas
¶ the schemas (NamedTuple classes) used by this node.
-
evaluated
¶ the pipelines this node has been evaluated for.
-
next
¶ the next SqlNode applied chronologically.
-
execution_count
¶ the execution count if in an IPython env.
-
schemas
= None
-
evaluated
= None
-
next
= None
-
execution_count
= 0
-
-
class
apache_beam.runners.interactive.sql.sql_chain.
SchemaLoadedSqlTransform
(output_name, query, schemas, execution_count)[source]¶ Bases:
apache_beam.transforms.ptransform.PTransform
PTransform that loads schema before executing SQL.
When submitting a pipeline to remote runner for execution, schemas defined in the main module are not available without save_main_session. However, save_main_session might fail when there is anything unpicklable. This DoFn makes sure only the schemas needed are pickled locally and restored later on workers.
-
class
apache_beam.runners.interactive.sql.sql_chain.
SqlChain
(nodes: Dict[str, apache_beam.runners.interactive.sql.sql_chain.SqlNode] = None, root: Optional[apache_beam.runners.interactive.sql.sql_chain.SqlNode] = None, current: Optional[apache_beam.runners.interactive.sql.sql_chain.SqlNode] = None, user_pipeline: Optional[apache_beam.pipeline.Pipeline] = None)[source]¶ Bases:
object
A chain of SqlNodes.
-
nodes
¶ all nodes by their output_names.
-
root
¶ the first SqlNode applied chronologically.
-
current
¶ the last node applied.
-
user_pipeline
¶ the user defined pipeline this chain originates from. If None, the whole chain just computes from raw values in queries. Otherwise, at least some of the nodes in chain has queried against PCollections.
-
nodes
= None
-
root
= None
-
current
= None
-
user_pipeline
= None
-