apache_beam.options.pipeline_options module

Pipeline options obtained from command line parsing.

class apache_beam.options.pipeline_options.PipelineOptions(flags=None, **kwargs)[source]

Bases: apache_beam.transforms.display.HasDisplayData

Pipeline options class used as container for command line options.

The class is essentially a wrapper over the standard argparse Python module (see https://docs.python.org/3/library/argparse.html). To define one option or a group of options you subclass from PipelineOptions:

class XyzOptions(PipelineOptions):

  @classmethod
  def _add_argparse_args(cls, parser):
    parser.add_argument('--abc', default='start')
    parser.add_argument('--xyz', default='end')

The arguments for the add_argument() method are exactly the ones described in the argparse public documentation.

Pipeline objects require an options object during initialization. This is obtained simply by initializing an options class as defined above:

p = Pipeline(options=XyzOptions())
if p.options.xyz == 'end':
  raise ValueError('Option xyz has an invalid value.')

By default the options classes will use command line arguments to initialize the options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
classmethod from_dictionary(options)[source]

Returns a PipelineOptions from a dictionary of arguments.

Parameters:options – Dictionary of argument value pairs.
Returns:A PipelineOptions object representing the given arguments.
get_all_options(drop_default=False, add_extra_args_fn=None)[source]

Returns a dictionary of all defined arguments.

Returns a dictionary of all defined arguments (arguments that are defined in any subclass of PipelineOptions) into a dictionary.

Parameters:
  • drop_default – If set to true, options that are equal to their default values, are not returned as part of the result dictionary.
  • add_extra_args_fn – Callback to populate additional arguments, can be used by runner to supply otherwise unknown args.
Returns:

Dictionary of all args and values.

display_data()[source]
view_as(cls)[source]
class apache_beam.options.pipeline_options.StandardOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
DEFAULT_RUNNER = 'DirectRunner'
class apache_beam.options.pipeline_options.TypeOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
class apache_beam.options.pipeline_options.DirectOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

DirectRunner-specific execution options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
class apache_beam.options.pipeline_options.GoogleCloudOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Google Cloud Dataflow service execution options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
BIGQUERY_API_SERVICE = 'bigquery.googleapis.com'
COMPUTE_API_SERVICE = 'compute.googleapis.com'
STORAGE_API_SERVICE = 'storage.googleapis.com'
DATAFLOW_ENDPOINT = 'https://dataflow.googleapis.com'
validate(validator)[source]
class apache_beam.options.pipeline_options.HadoopFileSystemOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

HadoopFileSystem connection options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
validate(validator)[source]
class apache_beam.options.pipeline_options.WorkerOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Worker pool configuration options.

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
validate(validator)[source]
class apache_beam.options.pipeline_options.DebugOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
add_experiment(experiment)[source]
lookup_experiment(key, default=None)[source]
class apache_beam.options.pipeline_options.ProfilingOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
class apache_beam.options.pipeline_options.SetupOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
class apache_beam.options.pipeline_options.TestOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Initialize an options class.

The initializer will traverse all subclasses, add all their argparse arguments and then parse the command line specified by flags or by default the one obtained from sys.argv.

The subclasses are not expected to require a redefinition of __init__.

Parameters:
  • flags – An iterable of command line arguments to be used. If not specified then sys.argv will be used as input for parsing arguments.
  • **kwargs – Add overrides for arguments passed in flags.
validate(validator)[source]