apache_beam.runners.dataflow package

Submodules

apache_beam.runners.dataflow.dataflow_metrics module

DataflowRunner implementation of MetricResults. It is in charge of responding to queries of current metrics by going to the dataflow service.

class apache_beam.runners.dataflow.dataflow_metrics.DataflowMetrics(dataflow_client=None, job_result=None, job_graph=None)[source]

Bases: apache_beam.metrics.metric.MetricResults

Implementation of MetricResults class for the Dataflow runner.

query(filter=None)[source]

apache_beam.runners.dataflow.dataflow_runner module

A runner implementation that submits a job for remote execution.

The runner will create a JSON description of the job graph and then submit it to the Dataflow Service for remote execution by a worker.

class apache_beam.runners.dataflow.dataflow_runner.DataflowRunner(cache=None)[source]

Bases: apache_beam.runners.runner.PipelineRunner

A runner that creates job graphs and submits them for remote execution.

Every execution of the run() method will submit an independent job for remote execution that consists of the nodes reachable from the passed in node argument or entire graph if node is None. The run() method returns after the service created the job and will not wait for the job to finish if blocking is set to False.

BATCH_ENVIRONMENT_MAJOR_VERSION = '6'
STREAMING_ENVIRONMENT_MAJOR_VERSION = '1'
apply_CombineValues(transform, pcoll)[source]
apply_GroupByKey(transform, pcoll)[source]
apply_WriteToBigQuery(transform, pcoll)[source]
static byte_array_to_json_string(raw_bytes)[source]

Implements org.apache.beam.sdk.util.StringUtils.byteArrayToJsonString.

classmethod deserialize_windowing_strategy(serialized_data)[source]
static flatten_input_visitor()[source]
static group_by_key_input_visitor()[source]
static json_string_to_byte_array(encoded_string)[source]

Implements org.apache.beam.sdk.util.StringUtils.jsonStringToByteArray.

static poll_for_job_completion(runner, result)[source]

Polls for the specified job to finish running (successfully or not).

run(pipeline)[source]

Remotely executes entire pipeline or parts reachable from node.

run_CombineValues(transform_node)[source]
run_Flatten(transform_node)[source]
run_GroupByKey(transform_node)[source]
run_Impulse(transform_node)[source]
run_ParDo(transform_node)[source]
run_Read(transform_node)[source]
run__NativeWrite(transform_node)[source]
classmethod serialize_windowing_strategy(windowing)[source]

apache_beam.runners.dataflow.ptransform_overrides module

Ptransform overrides for DataflowRunner.

class apache_beam.runners.dataflow.ptransform_overrides.CreatePTransformOverride[source]

Bases: apache_beam.pipeline.PTransformOverride

A PTransformOverride for Create in streaming mode.

get_matcher()[source]
get_replacement_transform(ptransform)[source]
static is_streaming_create(applied_ptransform)[source]

apache_beam.runners.dataflow.test_dataflow_runner module

Wrapper of Beam runners that’s built for running and verifying e2e tests.

class apache_beam.runners.dataflow.test_dataflow_runner.TestDataflowRunner(cache=None)[source]

Bases: apache_beam.runners.dataflow.dataflow_runner.DataflowRunner

run(pipeline)[source]

Execute test pipeline and verify test matcher

Module contents

The DataflowRunner executes pipelines on Google Cloud Dataflow.

Anything in this package not imported here is an internal implementation detail with no backwards-compatibility guarantees.