apache_beam.runners.interactive.sql.utils module

Module of utilities for SQL magics.

For internal use only; no backward-compatibility guarantees.

apache_beam.runners.interactive.sql.utils.register_coder_for_schema(schema: NamedTuple, verbose: bool = False) → None[source]

Registers a RowCoder for the given schema if hasn’t.

Notifies the user of what code has been implicitly executed.

apache_beam.runners.interactive.sql.utils.find_pcolls(sql: str, pcolls: Dict[str, apache_beam.pvalue.PCollection], verbose: bool = False) → Dict[str, apache_beam.pvalue.PCollection][source]

Finds all PCollections used in the given sql query.

It does a simple word by word match and calls ib.collect for each PCollection found.

apache_beam.runners.interactive.sql.utils.replace_single_pcoll_token(sql: str, pcoll_name: str) → str[source]

Replaces the pcoll_name used in the sql with ‘PCOLLECTION’.

For sql query using only a single PCollection, the PCollection needs to be referred to as ‘PCOLLECTION’ instead of its variable/tag name.

apache_beam.runners.interactive.sql.utils.pformat_namedtuple(schema: NamedTuple) → str[source]
apache_beam.runners.interactive.sql.utils.pformat_dict(raw_input: Dict[str, Any]) → str[source]
class apache_beam.runners.interactive.sql.utils.OptionsEntry(label: str, help: str, cls: Type[apache_beam.options.pipeline_options.PipelineOptions], arg_builder: Union[str, Dict[str, Optional[Callable]]], default: Optional[str] = None)[source]

Bases: object

An entry of PipelineOptions that can be visualized through ipywidgets to take inputs in IPython notebooks interactively.

label

The value of the Label widget.

help

The help message of the entry, usually the same to the help in PipelineOptions.

cls

The PipelineOptions class/subclass the options belong to.

arg_builder

Builds the argument/option. If it’s a str, this entry assigns the input ipywidget’s value directly to the argument. If it’s a Dict, use the corresponding Callable to assign the input value to each argument. If Callable is None, fallback to assign the input value directly. This allows building multiple similar PipelineOptions arguments from a single input, such as staging_location and temp_location in GoogleCloudOptions.

default

The default value of the entry, None if absent.

default = None
class apache_beam.runners.interactive.sql.utils.OptionsForm[source]

Bases: object

A form visualized to take inputs from users in IPython Notebooks and generate PipelineOptions to run pipelines.

add(entry: apache_beam.runners.interactive.sql.utils.OptionsEntry) → apache_beam.runners.interactive.sql.utils.OptionsForm[source]

Adds an OptionsEntry to the form.

to_options() → apache_beam.options.pipeline_options.PipelineOptions[source]

Builds the PipelineOptions based on user inputs.

Can only be invoked after display_for_input.

additional_options()[source]

Alters the self.options with additional config.

display_for_input() → apache_beam.runners.interactive.sql.utils.OptionsForm[source]

Displays the widgets to take user inputs.

display_actions()[source]

Displays actionable widgets to utilize the options, run pipelines and etc.

class apache_beam.runners.interactive.sql.utils.DataflowOptionsForm(output_name: str, output_pcoll: apache_beam.pvalue.PCollection, verbose: bool = False)[source]

Bases: apache_beam.runners.interactive.sql.utils.OptionsForm

A form to take inputs from users in IPython Notebooks to build PipelineOptions to run pipelines on Dataflow.

Only contains minimum fields needed.

Inits the OptionsForm for setting up Dataflow jobs.

additional_options()[source]
display_actions()[source]