public class DataflowRunner extends PipelineRunner<DataflowPipelineJob>
PipelineRunner
that executes the operations in the pipeline by first translating them
to the Dataflow representation using the DataflowPipelineTranslator
and then submitting
them to a Dataflow service for execution.
When reading from a Dataflow source or writing to a Dataflow sink using DataflowRunner
, the Google cloudservices account and the Google compute engine service account
of the GCP project running the Dataflow Job will need access to the corresponding source/sink.
Please see Google Cloud Dataflow Security and Permissions for more details.
Modifier and Type | Class and Description |
---|---|
static class |
DataflowRunner.StreamingPCollectionViewWriterFn<T>
A marker
DoFn for writing the contents of a PCollection to a streaming PCollectionView backend implementation. |
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
PROJECT_ID_REGEXP
Project IDs must contain lowercase letters, digits, or dashes.
|
static java.lang.String |
UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE
Experiment to "unsafely attempt to process unbounded data in batch mode".
|
Modifier | Constructor and Description |
---|---|
protected |
DataflowRunner(DataflowPipelineOptions options) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline |
applySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline,
DataflowPipelineOptions options) |
static DataflowRunner |
fromOptions(PipelineOptions options)
Construct a runner from the provided options.
|
DataflowPipelineTranslator |
getTranslator()
Returns the DataflowPipelineTranslator associated with this object.
|
static boolean |
hasExperiment(DataflowPipelineDebugOptions options,
java.lang.String experiment)
Returns true if the specified experiment is enabled, handling null experiments.
|
protected void |
replaceV1Transforms(Pipeline pipeline) |
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline |
resolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline) |
DataflowPipelineJob |
run(Pipeline pipeline)
Processes the given
Pipeline , potentially asynchronously, returning a runner-specific
type of result. |
void |
setHooks(DataflowRunnerHooks hooks)
Sets callbacks to invoke during execution see
DataflowRunnerHooks . |
protected java.util.List<DataflowPackage> |
stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline) |
java.lang.String |
toString() |
create, run, run
public static final java.lang.String UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE
public static final java.lang.String PROJECT_ID_REGEXP
protected DataflowRunner(DataflowPipelineOptions options)
public static DataflowRunner fromOptions(PipelineOptions options)
options
- Properties that configure the runner.protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline applySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline, DataflowPipelineOptions options)
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline resolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)
protected java.util.List<DataflowPackage> stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)
public DataflowPipelineJob run(Pipeline pipeline)
PipelineRunner
Pipeline
, potentially asynchronously, returning a runner-specific
type of result.run
in class PipelineRunner<DataflowPipelineJob>
public static boolean hasExperiment(DataflowPipelineDebugOptions options, java.lang.String experiment)
protected void replaceV1Transforms(Pipeline pipeline)
public DataflowPipelineTranslator getTranslator()
public void setHooks(DataflowRunnerHooks hooks)
DataflowRunnerHooks
.public java.lang.String toString()
toString
in class java.lang.Object