public class DataflowRunner extends PipelineRunner<DataflowPipelineJob>
PipelineRunner that executes the operations in the pipeline by first translating them
 to the Dataflow representation using the DataflowPipelineTranslator and then submitting
 them to a Dataflow service for execution.
 When reading from a Dataflow source or writing to a Dataflow sink using DataflowRunner, the Google cloudservices account and the Google compute engine service account
 of the GCP project running the Dataflow Job will need access to the corresponding source/sink.
 
Please see Google Cloud Dataflow Security and Permissions for more details.
DataflowRunner now supports creating job templates using the --templateLocation
 option. If this option is set, the runner will generate a template instead of running the
 pipeline immediately.
 
Example:
 --runner=DataflowRunner
 --templateLocation=gs://your-bucket/templates/my-template
 | Modifier and Type | Class and Description | 
|---|---|
static class  | 
DataflowRunner.DataflowTransformTranslator  | 
static class  | 
DataflowRunner.StreamingPCollectionViewWriterFn<T>
A marker  
DoFn for writing the contents of a PCollection to a streaming PCollectionView backend implementation. | 
| Modifier and Type | Field and Description | 
|---|---|
static java.lang.String | 
PROJECT_ID_REGEXP
Project IDs must contain lowercase letters, digits, or dashes. 
 | 
static java.lang.String | 
UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE
Experiment to "unsafely attempt to process unbounded data in batch mode". 
 | 
| Modifier | Constructor and Description | 
|---|---|
protected  | 
DataflowRunner(DataflowPipelineOptions options)  | 
| Modifier and Type | Method and Description | 
|---|---|
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline | 
applySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline,
                            DataflowPipelineOptions options)  | 
static DataflowRunner | 
fromOptions(PipelineOptions options)
Construct a runner from the provided options. 
 | 
DataflowPipelineTranslator | 
getTranslator()
Returns the DataflowPipelineTranslator associated with this object. 
 | 
static boolean | 
hasExperiment(DataflowPipelineDebugOptions options,
             java.lang.String experiment)
Returns true if the specified experiment is enabled, handling null experiments. 
 | 
static java.util.List<java.lang.String> | 
replaceGcsFilesWithLocalFiles(java.util.List<java.lang.String> filesToStage)
Replaces GCS file paths with local file paths by downloading the GCS files locally. 
 | 
protected void | 
replaceV1Transforms(Pipeline pipeline)  | 
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline | 
resolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)  | 
DataflowPipelineJob | 
run(Pipeline pipeline)
Processes the given  
Pipeline, potentially asynchronously, returning a runner-specific
 type of result. | 
void | 
setHooks(DataflowRunnerHooks hooks)
Sets callbacks to invoke during execution see  
DataflowRunnerHooks. | 
protected java.util.List<DataflowPackage> | 
stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)  | 
java.lang.String | 
toString()  | 
create, run, runpublic static final java.lang.String UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE
public static final java.lang.String PROJECT_ID_REGEXP
protected DataflowRunner(DataflowPipelineOptions options)
public static java.util.List<java.lang.String> replaceGcsFilesWithLocalFiles(java.util.List<java.lang.String> filesToStage)
filesToStage - List of file paths that may contain GCS paths (gs://) and local pathsjava.lang.RuntimeException - if there are errors copying GCS files locallypublic static DataflowRunner fromOptions(PipelineOptions options)
options - Properties that configure the runner.protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline applySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline,
                                                                                            DataflowPipelineOptions options)
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline resolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)
protected java.util.List<DataflowPackage> stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)
public DataflowPipelineJob run(Pipeline pipeline)
PipelineRunnerPipeline, potentially asynchronously, returning a runner-specific
 type of result.run in class PipelineRunner<DataflowPipelineJob>public static boolean hasExperiment(DataflowPipelineDebugOptions options, java.lang.String experiment)
protected void replaceV1Transforms(Pipeline pipeline)
public DataflowPipelineTranslator getTranslator()
public void setHooks(DataflowRunnerHooks hooks)
DataflowRunnerHooks.public java.lang.String toString()
toString in class java.lang.Object