apache_beam.runners.interactive.dataproc.dataproc_cluster_manager module¶

class apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.MasterURLIdentifier(project_id: Union[str, NoneType] = None, region: Union[str, NoneType] = None, cluster_name: Union[str, NoneType] = None)[source]¶

Bases: object

project_id = None¶

region = None¶

cluster_name = None¶

class apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.DataprocClusterManager(cluster_metadata: apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.MasterURLIdentifier)[source]¶

Bases: object

The DataprocClusterManager object simplifies the operations required for creating and deleting Dataproc clusters for use under Interactive Beam.

Initializes the DataprocClusterManager with properties required to interface with the Dataproc ClusterControllerClient.

create_cluster(cluster: dict) → None[source]¶

Attempts to create a cluster using attributes that were initialized with the DataprocClusterManager instance.

Parameters:	cluster – Dictionary representing Dataproc cluster. Read more about the schema for clusters here: https://cloud.google.com/python/docs/reference/dataproc/latest/google.cloud.dataproc_v1.types.Cluster

create_flink_cluster() → None[source]¶: Calls _create_cluster with a configuration that enables FlinkRunner.

cleanup() → None[source]¶: Deletes the cluster that uses the attributes initialized with the DataprocClusterManager instance.

describe() → None[source]¶: Returns a dictionary describing the cluster.

get_cluster_details(cluster_metadata: apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.MasterURLIdentifier) → google.cloud.dataproc_v1.types.clusters.Cluster[source]¶: Gets the Dataproc_v1 Cluster object for the current cluster manager.

wait_for_cluster_to_provision(cluster_metadata: apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.MasterURLIdentifier) → None[source]¶

get_staging_location(cluster_metadata: apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.MasterURLIdentifier) → str[source]¶: Gets the staging bucket of an existing Dataproc cluster.

parse_master_url_and_dashboard(cluster_metadata: apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.MasterURLIdentifier, line: str) → Tuple[str, str][source]¶

Parses the master_url and YARN application_id of the Flink process from an input line. The line containing both the master_url and application id is always formatted as such: {text} Found Web Interface {master_url} of application ‘{application_id}’.n

Truncated example where ‘…’ represents additional text between segments: … google-dataproc-startup[000]: … activate-component-flink[0000]: …org.apache.flink.yarn.YarnClusterDescriptor… [] - Found Web Interface example-master-url:50000 of application ‘application_123456789000_0001’.

Returns the flink_master_url and dashboard link as a tuple.

get_master_url_and_dashboard(cluster_metadata: apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.MasterURLIdentifier, staging_bucket) → Tuple[Optional[str], Optional[str]][source]¶: Returns the master_url of the current cluster.

cleanup_staging_files(staging_directory: str) → None[source]¶