apache_beam.options.pipeline_options module

Pipeline options obtained from command line parsing.

class apache_beam.options.pipeline_options.PipelineOptions(flags=None, **kwargs)[source]

Bases: apache_beam.transforms.display.HasDisplayData

This class and subclasses are used as containers for command line options.

These classes are wrappers over the standard argparse Python module (see https://docs.python.org/3/library/argparse.html). To define one option or a group of options, create a subclass from PipelineOptions.

Example Usage:

class XyzOptions(PipelineOptions):

  @classmethod
  def _add_argparse_args(cls, parser):
    parser.add_argument('--abc', default='start')
    parser.add_argument('--xyz', default='end')

The arguments for the add_argument() method are exactly the ones described in the argparse public documentation.

Pipeline objects require an options object during initialization. This is obtained simply by initializing an options class as defined above.

Example Usage:

p = Pipeline(options=XyzOptions())
if p.options.xyz == 'end':
  raise ValueError('Option xyz has an invalid value.')

Instances of PipelineOptions or any of its subclass have access to values defined by other PipelineOption subclasses (see get_all_options()), and can be converted to an instance of another PipelineOptions subclass (see view_as()). All views share the underlying data structure that stores option key-value pairs.

By default the options classes will use command line arguments to initialize the options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
classmethod from_dictionary(options)[source]

Returns a PipelineOptions from a dictionary of arguments.

Parameters:options – Dictionary of argument value pairs.
Returns:A PipelineOptions object representing the given arguments.
get_all_options(drop_default=False, add_extra_args_fn=None, retain_unknown_options=False)[source]

Returns a dictionary of all defined arguments.

Returns a dictionary of all defined arguments (arguments that are defined in any subclass of PipelineOptions) into a dictionary.

Parameters:
  • drop_default – If set to true, options that are equal to their default values, are not returned as part of the result dictionary.
  • add_extra_args_fn – Callback to populate additional arguments, can be used by runner to supply otherwise unknown args.
  • retain_unknown_options – If set to true, options not recognized by any known pipeline options class will still be included in the result. If set to false, they will be discarded.
Returns:

Dictionary of all args and values.

display_data()[source]
view_as(cls)[source]

Returns a view of current object as provided PipelineOption subclass.

Example Usage:

options = PipelineOptions(['--runner', 'Direct', '--streaming'])
standard_options = options.view_as(StandardOptions)
if standard_options.streaming:
  # ... start a streaming job ...

Note that options objects may have multiple views, and modifications of values in any view-object will apply to current object and other view-objects.

Parameters:cls – PipelineOptions class or any of its subclasses.
Returns:An instance of cls that is intitialized using options contained in current object.
class apache_beam.options.pipeline_options.StandardOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
DEFAULT_RUNNER = 'DirectRunner'
ALL_KNOWN_RUNNERS = ('apache_beam.runners.dataflow.dataflow_runner.DataflowRunner', 'apache_beam.runners.direct.direct_runner.BundleBasedDirectRunner', 'apache_beam.runners.direct.direct_runner.DirectRunner', 'apache_beam.runners.direct.direct_runner.SwitchingDirectRunner', 'apache_beam.runners.interactive.interactive_runner.InteractiveRunner', 'apache_beam.runners.portability.flink_runner.FlinkRunner', 'apache_beam.runners.portability.portable_runner.PortableRunner', 'apache_beam.runners.portability.spark_runner.SparkRunner', 'apache_beam.runners.test.TestDirectRunner', 'apache_beam.runners.test.TestDataflowRunner')
KNOWN_RUNNER_NAMES = ['DataflowRunner', 'BundleBasedDirectRunner', 'DirectRunner', 'SwitchingDirectRunner', 'InteractiveRunner', 'FlinkRunner', 'PortableRunner', 'SparkRunner', 'TestDirectRunner', 'TestDataflowRunner']
class apache_beam.options.pipeline_options.TypeOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
validate(unused_validator)[source]
class apache_beam.options.pipeline_options.DirectOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

DirectRunner-specific execution options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
class apache_beam.options.pipeline_options.GoogleCloudOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Google Cloud Dataflow service execution options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
BIGQUERY_API_SERVICE = 'bigquery.googleapis.com'
COMPUTE_API_SERVICE = 'compute.googleapis.com'
STORAGE_API_SERVICE = 'storage.googleapis.com'
DATAFLOW_ENDPOINT = 'https://dataflow.googleapis.com'
validate(validator)[source]
class apache_beam.options.pipeline_options.HadoopFileSystemOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

HadoopFileSystem connection options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
validate(validator)[source]
class apache_beam.options.pipeline_options.WorkerOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Worker pool configuration options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
validate(validator)[source]
class apache_beam.options.pipeline_options.DebugOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
add_experiment(experiment)[source]
lookup_experiment(key, default=None)[source]
validate(validator)[source]
class apache_beam.options.pipeline_options.ProfilingOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
class apache_beam.options.pipeline_options.SetupOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
class apache_beam.options.pipeline_options.TestOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
validate(validator)[source]
class apache_beam.options.pipeline_options.S3Options(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses of PipelineOptions do not need to redefine __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.