apache_beam.runners.dataflow.dataflow_runner module
A runner implementation that submits a job for remote execution.
The runner will create a JSON description of the job graph and then submit it to the Dataflow Service for remote execution by a worker.
- class apache_beam.runners.dataflow.dataflow_runner.DataflowRunner(cache=None)[source]
- Bases: - PipelineRunner- A runner that creates job graphs and submits them for remote execution. - Every execution of the run() method will submit an independent job for remote execution that consists of the nodes reachable from the passed-in node argument or entire graph if the node is None. The run() method returns after the service creates the job, and the job status is reported as RUNNING. - class NativeReadPTransformOverride
- Bases: - PTransformOverride- A - PTransformOverridefor- Readusing native sources.- The DataflowRunner expects that the Read PTransform using native sources act as a primitive. So this override replaces the Read with a primitive. - get_replacement_transform(ptransform)
 - matches(applied_ptransform)
 
 - static poll_for_job_completion(runner, result, duration, state_update_callback=None)[source]
- Polls for the specified job to finish running (successfully or not). - Updates the result with the new job information before returning. 
 - run_pipeline(pipeline, options, pipeline_proto=None)[source]
- Remotely executes entire pipeline or parts reachable from node. 
 - get_default_gcp_region()[source]
- Get a default value for Google Cloud region according to https://cloud.google.com/compute/docs/gcloud-compute/#default-properties. If no default can be found, returns None.