Class DataflowRunner

java.lang.Object
org.apache.beam.sdk.PipelineRunner<DataflowPipelineJob>
org.apache.beam.runners.dataflow.DataflowRunner

public class DataflowRunner extends PipelineRunner<DataflowPipelineJob>
A PipelineRunner that executes the operations in the pipeline by first translating them to the Dataflow representation using the DataflowPipelineTranslator and then submitting them to a Dataflow service for execution.

Permissions

When reading from a Dataflow source or writing to a Dataflow sink using DataflowRunner, the Google cloudservices account and the Google compute engine service account of the GCP project running the Dataflow Job will need access to the corresponding source/sink.

Please see Google Cloud Dataflow Security and Permissions for more details.

DataflowRunner now supports creating job templates using the --templateLocation option. If this option is set, the runner will generate a template instead of running the pipeline immediately.

Example:


 --runner=DataflowRunner
 --templateLocation=gs://your-bucket/templates/my-template
 
  • Field Details

    • UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE

      public static final String UNSAFELY_ATTEMPT_TO_PROCESS_UNBOUNDED_DATA_IN_BATCH_MODE
      Experiment to "unsafely attempt to process unbounded data in batch mode".
      See Also:
    • PROJECT_ID_REGEXP

      public static final String PROJECT_ID_REGEXP
      Project IDs must contain lowercase letters, digits, or dashes. IDs must start with a letter and may not end with a dash. This regex isn't exact - this allows for patterns that would be rejected by the service, but this is sufficient for basic validation of project IDs.
      See Also:
  • Constructor Details

  • Method Details

    • replaceGcsFilesWithLocalFiles

      public static List<String> replaceGcsFilesWithLocalFiles(List<String> filesToStage)
      Replaces GCS file paths with local file paths by downloading the GCS files locally. This is useful when files need to be accessed locally before being staged to Dataflow.
      Parameters:
      filesToStage - List of file paths that may contain GCS paths (gs://) and local paths
      Returns:
      List of local file paths where any GCS paths have been downloaded locally
      Throws:
      RuntimeException - if there are errors copying GCS files locally
    • fromOptions

      public static DataflowRunner fromOptions(PipelineOptions options)
      Construct a runner from the provided options.
      Parameters:
      options - Properties that configure the runner.
      Returns:
      The newly created runner.
    • applySdkEnvironmentOverrides

      protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline applySdkEnvironmentOverrides(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline, DataflowPipelineOptions options)
    • resolveArtifacts

      protected org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline resolveArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)
    • stageArtifacts

      protected List<DataflowPackage> stageArtifacts(org.apache.beam.model.pipeline.v1.RunnerApi.Pipeline pipeline)
    • run

      public DataflowPipelineJob run(Pipeline pipeline)
      Description copied from class: PipelineRunner
      Processes the given Pipeline, potentially asynchronously, returning a runner-specific type of result.
      Specified by:
      run in class PipelineRunner<DataflowPipelineJob>
    • hasExperiment

      public static boolean hasExperiment(DataflowPipelineDebugOptions options, String experiment)
      Returns true if the specified experiment is enabled, handling null experiments.
    • replaceV1Transforms

      protected void replaceV1Transforms(Pipeline pipeline)
    • getTranslator

      public DataflowPipelineTranslator getTranslator()
      Returns the DataflowPipelineTranslator associated with this object.
    • setHooks

      public void setHooks(DataflowRunnerHooks hooks)
      Sets callbacks to invoke during execution see DataflowRunnerHooks.
    • toString

      public String toString()
      Overrides:
      toString in class Object