apache_beam.runners.dataflow.dataflow_runner module

A runner implementation that submits a job for remote execution.

The runner will create a JSON description of the job graph and then submit it to the Dataflow Service for remote execution by a worker.

class apache_beam.runners.dataflow.dataflow_runner.DataflowRunner(cache=None)[source]

Bases: apache_beam.runners.runner.PipelineRunner

A runner that creates job graphs and submits them for remote execution.

Every execution of the run() method will submit an independent job for remote execution that consists of the nodes reachable from the passed-in node argument or entire graph if the node is None. The run() method returns after the service creates the job, and the job status is reported as RUNNING.

static poll_for_job_completion(runner, result, duration, state_update_callback=None)[source]

Polls for the specified job to finish running (successfully or not).

Updates the result with the new job information before returning.

  • runner – DataflowRunner instance to use for polling job state.
  • result – DataflowPipelineResult instance used for job information.
  • duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to None, it will wait indefinitely until the job is finished.
static side_input_visitor(deterministic_key_coders=True)[source]
static flatten_input_visitor()[source]
static combinefn_visitor()[source]
run_pipeline(pipeline, options, pipeline_proto=None)[source]

Remotely executes entire pipeline or parts reachable from node.


Get a default value for Google Cloud region according to https://cloud.google.com/compute/docs/gcloud-compute/#default-properties. If no default can be found, returns None.

class NativeReadPTransformOverride

Bases: apache_beam.pipeline.PTransformOverride

A PTransformOverride for Read using native sources.

The DataflowRunner expects that the Read PTransform using native sources act as a primitive. So this override replaces the Read with a primitive.