Package org.apache.beam.runners.dataflow
Class DataflowRunner
java.lang.Object
org.apache.beam.sdk.PipelineRunner<DataflowPipelineJob>
org.apache.beam.runners.dataflow.DataflowRunner
A 
PipelineRunner that executes the operations in the pipeline by first translating them
 to the Dataflow representation using the DataflowPipelineTranslator and then submitting
 them to a Dataflow service for execution.
 Permissions
When reading from a Dataflow source or writing to a Dataflow sink using 
 DataflowRunner, the Google cloudservices account and the Google compute engine service account
 of the GCP project running the Dataflow Job will need access to the corresponding source/sink.
 
Please see Google Cloud Dataflow Security and Permissions for more details.
DataflowRunner now supports creating job templates using the --templateLocation
 option. If this option is set, the runner will generate a template instead of running the
 pipeline immediately.
 
Example:
 --runner=DataflowRunner
 --templateLocation=gs://your-bucket/templates/my-template
 - 
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classstatic classA markerDoFnfor writing the contents of aPCollectionto a streamingPCollectionViewbackend implementation. - 
Field Summary
Fields - 
Constructor Summary
Constructors - 
Method Summary
Modifier and TypeMethodDescriptionprotected org.apache.beam.model.pipeline.v1.RunnerApi.PipelineapplySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline, DataflowPipelineOptions options) static DataflowRunnerfromOptions(PipelineOptions options) Construct a runner from the provided options.Returns the DataflowPipelineTranslator associated with this object.static booleanhasExperiment(DataflowPipelineDebugOptions options, String experiment) Returns true if the specified experiment is enabled, handling null experiments.replaceGcsFilesWithLocalFiles(List<String> filesToStage) Replaces GCS file paths with local file paths by downloading the GCS files locally.protected voidreplaceV1Transforms(Pipeline pipeline) protected org.apache.beam.model.pipeline.v1.RunnerApi.PipelineresolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline) Processes the givenPipeline, potentially asynchronously, returning a runner-specific type of result.voidsetHooks(DataflowRunnerHooks hooks) Sets callbacks to invoke during execution seeDataflowRunnerHooks.protected List<DataflowPackage> stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline) toString()Methods inherited from class org.apache.beam.sdk.PipelineRunner
create, run, run 
- 
Field Details
- 
UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE
Experiment to "unsafely attempt to process unbounded data in batch mode".- See Also:
 
 - 
PROJECT_ID_REGEXP
Project IDs must contain lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash. This regex isn't exact - this allows for patterns that would be rejected by the service, but this is sufficient for basic validation of project IDs.- See Also:
 
 
 - 
 - 
Constructor Details
- 
DataflowRunner
 
 - 
 - 
Method Details
- 
replaceGcsFilesWithLocalFiles
Replaces GCS file paths with local file paths by downloading the GCS files locally. This is useful when files need to be accessed locally before being staged to Dataflow.- Parameters:
 filesToStage- List of file paths that may contain GCS paths (gs://) and local paths- Returns:
 - List of local file paths where any GCS paths have been downloaded locally
 - Throws:
 RuntimeException- if there are errors copying GCS files locally
 - 
fromOptions
Construct a runner from the provided options.- Parameters:
 options- Properties that configure the runner.- Returns:
 - The newly created runner.
 
 - 
applySdkEnvironmentOverrides
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline applySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline, DataflowPipelineOptions options)  - 
resolveArtifacts
protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline resolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)  - 
stageArtifacts
protected List<DataflowPackage> stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)  - 
run
Description copied from class:PipelineRunnerProcesses the givenPipeline, potentially asynchronously, returning a runner-specific type of result.- Specified by:
 runin classPipelineRunner<DataflowPipelineJob>
 - 
hasExperiment
Returns true if the specified experiment is enabled, handling null experiments. - 
replaceV1Transforms
 - 
getTranslator
Returns the DataflowPipelineTranslator associated with this object. - 
setHooks
Sets callbacks to invoke during execution seeDataflowRunnerHooks. - 
toString
 
 -