apache_beam.options package

Submodules

apache_beam.options.pipeline_options module

Pipeline options obtained from command line parsing.

class apache_beam.options.pipeline_options.PipelineOptions(flags=None, **kwargs)[source]

Bases: apache_beam.transforms.display.HasDisplayData

Pipeline options class used as container for command line options.

The class is essentially a wrapper over the standard argparse Python module (see https://docs.python.org/3/library/argparse.html). To define one option or a group of options you subclass from PipelineOptions:

class XyzOptions(PipelineOptions):

  @classmethod
  def _add_argparse_args(cls, parser):
    parser.add_argument('--abc', default='start')
    parser.add_argument('--xyz', default='end')

The arguments for the add_argument() method are exactly the ones described in the argparse public documentation.

Pipeline objects require an options object during initialization. This is obtained simply by initializing an options class as defined above:

p = Pipeline(options=XyzOptions())
if p.options.xyz == 'end':
  raise ValueError('Option xyz has an invalid value.')

By default the options classes will use command line arguments to initialize the options.

display_data()[source]
classmethod from_dictionary(options)[source]

Returns a PipelineOptions from a dictionary of arguments.

Parameters:options – Dictinary of argument value pairs.
Returns:A PipelineOptions object representing the given arguments.
get_all_options(drop_default=False)[source]

Returns a dictionary of all defined arguments.

Returns a dictionary of all defined arguments (arguments that are defined in any subclass of PipelineOptions) into a dictionary.

Parameters:drop_default – If set to true, options that are equal to their default values, are not returned as part of the result dictionary.
Returns:Dictionary of all args and values.
view_as(cls)[source]
class apache_beam.options.pipeline_options.StandardOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

DEFAULT_RUNNER = 'DirectRunner'
validate(validator)[source]
class apache_beam.options.pipeline_options.TypeOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

class apache_beam.options.pipeline_options.DirectOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

DirectRunner-specific execution options.

class apache_beam.options.pipeline_options.GoogleCloudOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Google Cloud Dataflow service execution options.

BIGQUERY_API_SERVICE = 'bigquery.googleapis.com'
COMPUTE_API_SERVICE = 'compute.googleapis.com'
DATAFLOW_ENDPOINT = 'https://dataflow.googleapis.com'
STORAGE_API_SERVICE = 'storage.googleapis.com'
validate(validator)[source]
class apache_beam.options.pipeline_options.WorkerOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

Worker pool configuration options.

validate(validator)[source]
class apache_beam.options.pipeline_options.DebugOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

class apache_beam.options.pipeline_options.ProfilingOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

class apache_beam.options.pipeline_options.SetupOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

class apache_beam.options.pipeline_options.TestOptions(flags=None, **kwargs)[source]

Bases: apache_beam.options.pipeline_options.PipelineOptions

validate(validator)[source]

apache_beam.options.pipeline_options_validator module

Pipeline options validator.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.options.pipeline_options_validator.PipelineOptionsValidator(options, runner)[source]

Bases: object

Validates PipelineOptions.

Goes through a list of known PipelineOption subclassess and calls:

validate(validator)

if one is implemented. Aggregates a list of validation errors from all and returns an aggregated list.

ENDPOINT_PATTERN = 'https://[\\S]*googleapis\\.com[/]?'
ERR_INVALID_GCS_BUCKET = 'Invalid GCS bucket (%s), given for the option: %s. See https://developers.google.com/storage/docs/bucketnaming for more details.'
ERR_INVALID_GCS_OBJECT = 'Invalid GCS object (%s), given for the option: %s.'
ERR_INVALID_GCS_PATH = 'Invalid GCS path (%s), given for the option: %s.'
ERR_INVALID_JOB_NAME = 'Invalid job_name (%s); the name must consist of only the characters [-a-z0-9], starting with a letter and ending with a letter or number'
ERR_INVALID_NOT_POSITIVE = 'Invalid value (%s) for option: %s. Value needs to be positive.'
ERR_INVALID_PROJECT_ID = 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project description.'
ERR_INVALID_PROJECT_NUMBER = 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project number.'
ERR_INVALID_TEST_MATCHER_TYPE = 'Invalid value (%s) for option: %s. Please extend your matcher object from hamcrest.core.base_matcher.BaseMatcher.'
ERR_INVALID_TEST_MATCHER_UNPICKLABLE = 'Invalid value (%s) for option: %s. Please make sure the test matcher is unpicklable.'
ERR_MISSING_GCS_PATH = 'Missing GCS path option: %s.'
ERR_MISSING_OPTION = 'Missing required option: %s.'
GCS_BUCKET = '^[a-z0-9][-_a-z0-9.]+[a-z0-9]$'
GCS_SCHEME = 'gs'
GCS_URI = '(?P<SCHEME>[^:]+)://(?P<BUCKET>[^/]+)(/(?P<OBJECT>.*))?'
JOB_PATTERN = '[a-z]([-a-z0-9]*[a-z0-9])?'
OPTIONS = [<class 'apache_beam.options.pipeline_options.DebugOptions'>, <class 'apache_beam.options.pipeline_options.GoogleCloudOptions'>, <class 'apache_beam.options.pipeline_options.SetupOptions'>, <class 'apache_beam.options.pipeline_options.StandardOptions'>, <class 'apache_beam.options.pipeline_options.TypeOptions'>, <class 'apache_beam.options.pipeline_options.WorkerOptions'>, <class 'apache_beam.options.pipeline_options.TestOptions'>]
PROJECT_ID_PATTERN = '[a-z][-a-z0-9:.]+[a-z0-9]'
PROJECT_NUMBER_PATTERN = '[0-9]*'
is_full_string_match(pattern, string)[source]

Returns True if the pattern matches the whole string.

is_service_runner()[source]

True if pipeline will execute on the Google Cloud Dataflow service.

validate()[source]

Calls validate on subclassess and returns a list of errors.

validate will call validate method on subclasses, accumulate the returned list of errors, and returns the aggregate list.

Returns:Aggregate list of errors after all calling all possible validate methods.
validate_cloud_options(view)[source]

Validates job_name and project arguments.

validate_gcs_path(view, arg_name)[source]

Validates a GCS path against gs://bucket/object URI format.

validate_optional_argument_positive(view, arg_name)[source]

Validates that an optional argument (if set) has a positive value.

validate_test_matcher(view, arg_name)[source]

Validates that on_success_matcher argument if set.

Validates that on_success_matcher is unpicklable and is instance of hamcrest.core.base_matcher.BaseMatcher.

apache_beam.options.value_provider module

A ValueProvider class to implement templates with both statically and dynamically provided values.

class apache_beam.options.value_provider.ValueProvider[source]

Bases: object

get()[source]
is_accessible()[source]
class apache_beam.options.value_provider.StaticValueProvider(value_type, value)[source]

Bases: apache_beam.options.value_provider.ValueProvider

get()[source]
is_accessible()[source]
class apache_beam.options.value_provider.RuntimeValueProvider(option_name, value_type, default_value)[source]

Bases: apache_beam.options.value_provider.ValueProvider

get()[source]
is_accessible()[source]
runtime_options = None
classmethod set_runtime_options(pipeline_options)[source]
apache_beam.options.value_provider.check_accessible(value_provider_list)[source]

Check accessibility of a list of ValueProvider objects.

Module contents