apache_beam.options.pipeline_options_validator module

Pipeline options validator.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.options.pipeline_options_validator.PipelineOptionsValidator(options, runner)[source]

Validates PipelineOptions.

Goes through a list of known PipelineOption subclassess and calls:


if one is implemented. Aggregates a list of validation errors from all and returns an aggregated list.

OPTIONS = [<class 'apache_beam.options.pipeline_options.DebugOptions'>, <class 'apache_beam.options.pipeline_options.GoogleCloudOptions'>, <class 'apache_beam.options.pipeline_options.PortableOptions'>, <class 'apache_beam.options.pipeline_options.SetupOptions'>, <class 'apache_beam.options.pipeline_options.StandardOptions'>, <class 'apache_beam.options.pipeline_options.TestOptions'>, <class 'apache_beam.options.pipeline_options.TypeOptions'>, <class 'apache_beam.options.pipeline_options.WorkerOptions'>]
REQUIRED_ENVIRONMENT_OPTIONS = {'DOCKER': [], 'EXTERNAL': ['external_service_address'], 'LOOPBACK': [], 'PROCESS': ['process_command']}
OPTIONAL_ENVIRONMENT_OPTIONS = {'DOCKER': ['docker_container_image'], 'EXTERNAL': [], 'LOOPBACK': [], 'PROCESS': ['process_variables']}
ERR_MISSING_OPTION = 'Missing required option: %s.'
ERR_MISSING_GCS_PATH = 'Missing GCS path option: %s.'
ERR_INVALID_GCS_PATH = 'Invalid GCS path (%s), given for the option: %s.'
ERR_INVALID_GCS_BUCKET = 'Invalid GCS bucket (%s), given for the option: %s. See https://developers.google.com/storage/docs/bucketnaming for more details.'
ERR_INVALID_GCS_OBJECT = 'Invalid GCS object (%s), given for the option: %s.'
ERR_INVALID_JOB_NAME = 'Invalid job_name (%s); the name must consist of only the characters [-a-z0-9], starting with a letter and ending with a letter or number'
ERR_INVALID_PROJECT_NUMBER = 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project number.'
ERR_INVALID_PROJECT_ID = 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project description.'
ERR_INVALID_ENDPOINT = 'Invalid url (%s) for dataflow endpoint. Please provide a valid url.'
ERR_INVALID_NOT_POSITIVE = 'Invalid value (%s) for option: %s. Value needs to be positive.'
ERR_INVALID_TEST_MATCHER_TYPE = 'Invalid value (%s) for option: %s. Please extend your matcher object from hamcrest.core.base_matcher.BaseMatcher.'
ERR_INVALID_TEST_MATCHER_UNPICKLABLE = 'Invalid value (%s) for option: %s. Please make sure the test matcher is unpicklable.'
ERR_INVALID_TRANSFORM_NAME_MAPPING = 'Invalid transform name mapping format. Please make sure the mapping is string key-value pairs. Invalid pair: (%s:%s)'
ERR_INVALID_ENVIRONMENT = 'Option %s is not compatible with environment type %s.'
ERR_ENVIRONMENT_CONFIG = 'Option environment_config is incompatible with option(s) %s.'
ERR_MISSING_REQUIRED_ENVIRONMENT_OPTION = 'Option %s is required for environment type %s.'
ERR_NUM_WORKERS_TOO_HIGH = 'num_workers (%s) cannot exceed max_num_workers (%s)'
ERR_REPEATABLE_OPTIONS_NOT_SET_AS_LIST = '(%s) is a string. Programmatically set PipelineOptions like (%s) options need to be specified as a list.'
GCS_URI = '(?P<SCHEME>[^:]+)://(?P<BUCKET>[^/]+)(/(?P<OBJECT>.*))?'
GCS_BUCKET = '^[a-z0-9][-_a-z0-9.]+[a-z0-9]$'
JOB_PATTERN = '[a-z]([-a-z0-9]*[a-z0-9])?'
PROJECT_ID_PATTERN = '[a-z][-a-z0-9:.]+[a-z0-9]'

Calls validate on subclassess and returns a list of errors.

validate will call validate method on subclasses, accumulate the returned list of errors, and returns the aggregate list.


Aggregate list of errors after all calling all possible validate methods.


True if pipeline will execute on the Google Cloud Dataflow service.

is_full_string_match(pattern, string)[source]

Returns True if the pattern matches the whole string.

validate_gcs_path(view, arg_name)[source]

Validates a GCS path against gs://bucket/object URI format.


Validates job_name and project arguments.


Validates that Dataflow worker number is valid.


Validates Dataflow worker region and zone arguments are consistent.

validate_optional_argument_positive(view, arg_name)[source]

Validates that an optional argument (if set) has a positive value.

validate_test_matcher(view, arg_name)[source]

Validates that on_success_matcher argument if set.

Validates that on_success_matcher is unpicklable and is instance of hamcrest.core.base_matcher.BaseMatcher.


Validates portable environment options.

validate_repeatable_argument_passed_as_list(view, arg_name)[source]

Validates that repeatable PipelineOptions like dataflow_service_options or experiments are specified as a list when set programmatically. This way, users do not inadvertently specify it as a string, mirroring the way they are set via the command lineRepeatable options, which are as passed a list.
