apache_beam.runners.dataflow.dataflow_runner module¶
A runner implementation that submits a job for remote execution.
The runner will create a JSON description of the job graph and then submit it to the Dataflow Service for remote execution by a worker.
-
class
apache_beam.runners.dataflow.dataflow_runner.
DataflowRunner
(cache=None)[source]¶ Bases:
apache_beam.runners.runner.PipelineRunner
A runner that creates job graphs and submits them for remote execution.
Every execution of the run() method will submit an independent job for remote execution that consists of the nodes reachable from the passed in node argument or entire graph if node is None. The run() method returns after the service created the job and will not wait for the job to finish if blocking is set to False.
-
static
poll_for_job_completion
(runner, result, duration)[source]¶ Polls for the specified job to finish running (successfully or not).
Updates the result with the new job information before returning.
Parameters:
-
run_pipeline
(pipeline, options)[source]¶ Remotely executes entire pipeline or parts reachable from node.
-
run_RunnerAPIPTransformHolder
(transform_node, options)[source]¶ Adding Dataflow runner job description for transform holder objects.
These holder transform objects are generated for some of the transforms that become available after a cross-language transform expansion, usually if the corresponding transform object cannot be generated in Python SDK (for example, a python ParDo transform cannot be generated without a serialized Python DoFn object).
-
static
byte_array_to_json_string
(raw_bytes)[source]¶ Implements org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString.
-
static
json_string_to_byte_array
(encoded_string)[source]¶ Implements org.apache.beam.sdk.util.StringUtils.jsonStringToByteArray.
-
get_default_gcp_region
()[source]¶ Get a default value for Google Cloud region according to https://cloud.google.com/compute/docs/gcloud-compute/#default-properties. If no default can be found, returns None.
-
class
CreatePTransformOverride
¶ Bases:
apache_beam.pipeline.PTransformOverride
A
PTransformOverride
forCreate
in streaming mode.-
get_replacement_transform
(ptransform)¶
-
matches
(applied_ptransform)¶
-
-
class
JrhReadPTransformOverride
¶ Bases:
apache_beam.pipeline.PTransformOverride
A
PTransformOverride
forRead(BoundedSource)
-
get_replacement_transform
(ptransform)¶
-
matches
(applied_ptransform)¶
-
-
class
ReadPTransformOverride
¶ Bases:
apache_beam.pipeline.PTransformOverride
A
PTransformOverride
forRead(BoundedSource)
-
get_replacement_transform
(ptransform)¶
-
matches
(applied_ptransform)¶
-
-
static